KEMBAR78
Major Project | PDF | Machine Learning | Support Vector Machine
0% found this document useful (0 votes)
10 views53 pages

Major Project

The document presents a project report on 'Diabetes Prediction Using Machine Learning Techniques' submitted to Jawaharlal Nehru Technological University Hyderabad. It outlines the development of a diabetes prediction system utilizing the Pima Indians Diabetes Dataset and various machine learning algorithms, including SVM and KNN, to enhance early diagnosis and intervention. The project aims to provide a cost-effective tool for healthcare professionals to predict diabetes risk based on medical parameters.

Uploaded by

alakuntla937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views53 pages

Major Project

The document presents a project report on 'Diabetes Prediction Using Machine Learning Techniques' submitted to Jawaharlal Nehru Technological University Hyderabad. It outlines the development of a diabetes prediction system utilizing the Pima Indians Diabetes Dataset and various machine learning algorithms, including SVM and KNN, to enhance early diagnosis and intervention. The project aims to provide a cost-effective tool for healthcare professionals to predict diabetes risk based on medical parameters.

Uploaded by

alakuntla937
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Diabetes Prediction UsingMachine Learning Techniques

A Project Report Submitted to


Jawaharlal NehruTechnological University Hyderabad

In partial fulfillment of the requirements


for the award of the degree of
BACHELOROFTECHNOLOGY
IN
Eldanda Akhila 21E11A0512
Gondi Vinitha 21E11A0516
BisarallaSuprathika 21E11A0505
Manthani Sravani 21E11A0523

DR.V.SRINIVAS RAO
Professor,Dean
Department of Computer Science and Engineering

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING


BHARATINSTITUTE OFENGINEERINGAND TECHNOLOGY
Accredited by NAAC,A-Grade Accredited by NBA (UG Programmes: CSE, ECE, EEE &
Mechanical) Approved by AICTE, Affiliated to JNTUH Hyderabad
Ibrahimpatnam -501 510, Hyderabad, Telangana

JUNE 2025

i
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
BHARATINSTITUTE OFENGINEERINGAND TECHNOLOGY
Accredited by NAAC, Accredited by NBA (UG Programmes: CSE, ECE, EEE & Mechanical)
Approved by AICTE, Affiliated to JNTUH Hyderabad
Ibrahimpatnam -501 510, Hyderabad, Telangana

Certificate
This is to certify that the Project work (Phase-2) entitled “DIABETES
PREDICTION USING MACHINE LEARNING TECHNIQUES” is the bonafide
work done
By

Eldanda Akhila 21E11A0512


Gondi Vinitha 21E11A0516
Bisaralla Suprathika 21E11A0505
Manthani Sravani 21E11A0523

in the Department of Computer Science and Engineering, BHARAT INSTITUTE OF


ENGINEERING AND TECHNOLOGY, Ibrahimpatnam is submitted to Jawaharlal
Nehru Technological University, Hyderabad in partial fulfillment of the requirements
for the award of B.Tech degree in Computer Science and Engineering during 2024-
2025.
Supervisor: Department I/C
Dr.V.SRINIVAS RAO DR. DEEPAK KACHAVE
Professor&Dean Assosiate Professor
Dept of Computer Science and Engineering Dept of Computer Science and Engineering
BharatInstitute of Engineering and Technology BharatInstitute of Engineering and Technology
Ibrahimpatnam–501 510, Hyderabad Ibrahimpatnam– 501 510, Hyderabad

Viva-Voce held on……………………………………………

Internal Examiner External Examiner

ii
ACKNOWLEDGEMENT

The satisfaction that accompanies the successful completion of the task would be put
incomplete without the mention of the people who made it possible, whose constant guidance
and encouragement crown all the efforts with success.

We avail this opportunity to express our deep sense of gratitude and hearty thanks to
Sri CH. Venugopal Reddy, Chairman & Secretary of BIET, for providing congenial
atmosphere and encouragement.

We would like to thank Prof. G. Kumaraswamy Rao, Former Director & O.S. of DLRL
Ministry of Defence, Sr. Director R&D, BIET, and Dr. V Srinivasa Rao, Dean CSE, for having
provided all the facilities and support.

We would like to thank our Department Incharge Dr. Deepak, for encouragement at
various levels of our Project.

We are thankful to our Project Coordinator Dr. Rama Prakasha Reddy.Ch, Assistant
Professor, Computer Science and Engineering for her support and cooperation throughout the
process of this project.

We are thankful to our guide Dr. V Srinivas Rao Professor & Dean, Computer Science
and Engineering for his sustained inspiring Guidance and cooperation throughout the process of
this project. His wise counsel and suggestions were invaluable.

We express our deep sense of gratitude and thanks to all the Teaching and Non-Teaching
Staff of our college who stood with us during the project and helped us to make it a successful
venture.

We place highest regards to our Parent, our Friends and Well-wishers who helped a lot in
making the report of this project

E.Akhila 21E11A0512
G.Vinitha 21E11A0516
B.Suprathika 21E11A0505
M.Sravani 21E11A0523

iii
Declaration
We hereby declare that this Project Work (phase-2) is titled Diabetes

Prediction Using Machine Learning Techniques is a Project work carried out


by us, in B.Tech (Computer Science and Engineering) degree course

of Jawaharlal Nehru Technology University Hyderabad,


Hyderabad and has not been submitted to any other course or university for the
award of my degree by me.

Signatures of the Project team members

1.

2.

3.

4.

iv
ABSTRACT

“Diabetes Prediction Using Machine Learning Techniques” Diabetes is a chronic disease


that poses a significant health threat worldwide, with millions of people affected each year.
Early detection and intervention are crucial to prevent severe complications. This project
presents a Diabetes Prediction System using Machine Learning techniques to assist in the
early diagnosis of diabetes based on medical parameters. The system leverages the Pima
Indians Diabetes Dataset, which contains patient information such as glucose level, BMI,
age, insulin level, and other clinical features. Data preprocessing techniques such as missing
value handling and feature scaling are applied to prepare the dataset for training. Several
machine learning algorithms—including Support Vector Machine (SVM), K-Nearest
Neighbors (KNN)-are implemented and evaluated. Among these, SVM showed promising
results with high accuracy and reliable prediction capability. Performance metrics such as
accuracy, precision, recall, F1-score, and ROC-AUC are used to assess model effectiveness
The proposed system provides a non-invasive, cost-effective, and time-efficient tool that can
aid healthcare professionals in screening individuals at risk of diabetes. The implementation
can also be extended into a web-based application for real-time predictions, thereby
increasing its accessibility and impact in real-world scenarios.

Keywords: Healthcare analytics, Early Detection

v
TABLE OF CONTENTS

Contents
Chapter no. Title Page no

Acknowledgements …………………………………………………………………………………………………… iii


Abstract…………………………………………………………………………………………………………………… v
Table of Contents ………………………………………………………………………………………………………… vi
List of Figures ……………………………………………………………………………………………………………… vii
List of Symbols and Abbreviations ……………………………………………………………………………… viii
1. Introduction……………………………………………………………………………………………… 1
1.1 Diabetes Prediction Using Diabetes Prediction………………………………………………… 1
1.2 Components…………………………………………………………………………………………….. 2
1.3 Introduction to Machine Learning…………………………………………………………………. 5
6
1.4 Role of machine learning algorithms in Diabetes Prediction………………………………

2. Related Work………………………………………………………………………………………………… 7
3. Motivation……………………………………………………………………………………………………… 8
4. Objectives……………………………………………………………………………………………………… 9
5. Problem Statement………………………………………………………………………………………… 11
6. Design Methodology……………………………………………………………………………………… 12
6.1 System Architecture…………………………………………………………………………………… 12
6.2 System Modules…………………………………………………………………………………………… 15
6.3 Requirement Specification…………………………………………………………………………… 19
6.4 UML Diagrams…………………………………………………………………………………………… 20
7. Experimental Studies………………………………………………………………………………………….. 26
6.5 Test Cases…………………………………………………………………………………………………………
29
6.6 Result Analysis…………………………………………………………………………………………………… 30

8. Conclusion and Future Scope…………………………………………………………………………………. 34


References……………………………………………………………………………………………………………….. 35
vi
LIST OF FIGURES

Figure Caption Page No.


No.
6.1 System AnalysisArchitecture…………………………………………………………… 12
6.2 Technical architecture of proposed system ………………………………………… 14

6.3 Use casediagram for diabetes prediction………………………………………… 20

6.4 Class diagram for diabetes prediction………………………………………………… 21

6.5 Activity diagram for diabetes prediction ………………………………………………. 22

6.6 Sequencediagram for diabetes prediction ……………………………………………. 24


6.6 Statechart for diabetes prediction ……………………………………………………….. 25

vii
LIST OF TABLES
Table No. Caption Page
No.

7.1 Test Cases for Diabetes Prediction ................... 29

7.2 Test Cases for Home Page ............................... 30

7.3 Test Cases for Login Page ............................... 31

7.4 Test Cases for Predict Page ............................. 32

7.5 Test Cases for Result Page .............................. 33

ix
2024-2025

LISTOFSYMBOLS AND ABBREVATIONS

Symbol Description

ML Machine Learning
SVM Support Vector Machine
KNN K-Nearest Neighbors
DT Decision Tree
ROC Receiver Operating Characteristic
AUC Area Under Curve
GUI Graphical User Interface
CSV Comma Separated Values
BMI Body Mass Index
PIMA Pima Indian Diabetes Dataset
PCA Principal Component Analysis
EDA Exploratory Data Analysis

x
2024-2025

1. INTRODUCTION

1.1 Diabetes Prediction Using Machine Learning

The diabetes prediction system consists of several key components that contribute to the
overall functionality. Diabetes mellitus is a chronic metabolic disorder that has become a
significant global health concern due to its rapidly increasing prevalence. It is primarily
characterized by high blood sugar levels resulting from the body's inability to produce or
effectively use insulin. If left undiagnosed or poorly managed, diabetes can lead to severe
health complications such as heart disease, kidney failure, nerve damage, and vision loss.
Early diagnosis and timely treatment are crucial in reducing the risk of such complications
and improving patient outcomes. Traditional diagnostic methods often involve clinical tests
and expert medical evaluation, which can be time-consuming, costly, and sometimes
inaccessible, especially in rural or underdeveloped areas. With the exponential growth of
healthcare data and advances in computational technologies, machine learning (ML) has
emerged as a promising approach for developing intelligent systems capable of predicting
diseases based on historical data.

Machine learning techniques can identify complex patterns and relationships within large
datasets and use them to make accurate predictions, making them highly effective for medical
diagnosis tasks. Each component This project focuses on building a diabetes prediction
system using various machine learning algorithms such as Support Vector Machine (SVM),
K-Nearest Neighbors (KNN) and Decision Tree. The system is developed using the Pima
Indians Diabetes Dataset, which contains medical records of female patients along with
features like glucose level, blood pressure, insulin level, body mass index (BMI), and age.
The dataset undergoes preprocessing steps including handling missing values, normalization,
and feature selection before being used to train the models. Each model is evaluated using
performance metrics like accuracy, precision, recall, F1-score, and the Area Under the ROC
Curve (AUC) to determine its effectiveness. The goal of this system is to provide an accurate,
efficient,and accessible tool that can aid healthcare professionals and individuals in the early
detection of diabetes, potentially reducing the burden of the disease on both patients and
health care systems.

1
2024-2025

1.2 Components
The diabetes prediction system consists of several key components that contribute to the
overall functionality. Each component plays a crucial role in the data preprocessing, model
training, evaluation, and prediction process. Below is an overview of the primary components:

1. Data Collection

 Dataset: The Pima Indians Diabetes Dataset serves as the primary data source. This dataset
contains 768 instances, each with 8 input features and one binary target variable (indicating
the presence or absence of diabetes).
 Attributes: The dataset includes the following features:

 Pregnancies: Number of times pregnant.

 Glucose: Plasmaglucose concentration after 2 hours in an oral glucosetolerance test.

 Blood Pressure: Diastolic blood pressure (mm Hg).

 Skin Thickness: Triceps skin fold thickness (mm).

 Insulin: 2-hour serum insulin (mu U/ml).

 BMI: Body mass index (weight in kg / height in m²).

 Diabetes Pedigree Function: A function which scores the likelihood of diabetes based on
family history.
 Age: Agein years.

 Target: The target variable indicates whether the individual has diabetes (1) or not (0).

2. Data Preprocessing

 Data Cleaning: The dataset includes some missing or zero values that were handled using
imputation techniques.
o For example, missing glucose levels were imputed using the median value.

o Zero values in features likeglucose and insulin were treated as missing and imputed.

 Normalization: Feature scaling was applied using Min-Max scaling to standardizethe input.

2
2024-2025

 Train-Test Split: The dataset was split into 80% training data and 20% testing data, using
stratified sampling to preserve the proportion of diabetic and non-diabetic cases in both
subsets.

3. Machine Learning Models

 Decision Tree: A tree-based model that makes decisions based on feature values, resulting in
a flowchart-like structure.

 Support Vector Machine (SVM): A classification algorithm that finds the hyperplane that
best separates the classes.

 K-Nearest Neighbors (KNN): A non-parametric algorithm that classifies data based on the
closest points to the test instance.

4. Model Evaluation

Themodels were evaluatedusing the following metrics:

 Accuracy: The proportion of correct predictions out ofthe total number of predictions.

 Precision: The proportion of true positive predictions among all positive predictions made by
the model.
 Recall (Sensitivity): The proportion of true positive predictions among all actual positive
instances.
 F1-Score: The harmonic mean of precision and recall, providing a balance between the two
metrics.
 ROC-AUC: The Area Under the Receiver Operating Characteristic Curve, which represents
the ability of the model to discriminate between the two classes.

3
2024-2025

5. Prediction & Deployment

 Model Inference: Once the model is trained and evaluated, it can be used to predict whether
a new individual is diabetic or not based on their input features.
 Web/Software Interface: A simple interface can be created to allow healthcare providers or
individuals to input their data and receive predictions regarding their diabetes status.
Integration: The trained model can be integrated into healthcare applications or deployed on
the cloud for scalable access.

4
2024-2025

1.3 INTRODUCTIONTO MACHINE LEARNING

Machine learning is a powerful subset of artificial intelligence that empowers computer


systems to automatically learn and improve from experience without being explicitly
programmed. It involves developing algorithms that can analyze vast amounts of data, detect
hidden patterns, make predictions, and continuously refine their performance based on
feedback. The core idea behind machine learning is that systems can learn from data, identify
trends, and make decisions with minimal human intervention. There are several types of
machine learning, including supervised learning (where the algorithm learns from labeled
data), unsupervised learning (which deals with unlabeled data) and reinforcement learning
(where the system learns by interacting with an environment and receiving rewards or
penalties). In recent years, machine learning has become one of the most influential
technologies driving innovation across various industries. In particular, the healthcare sector
has greatly benefited from machine learning applications. From diagnosing diseases and
predicting patient outcomes to personalizing treatment plans and managing healthcare
resources, machine learning has opened new avenues for improving the quality and
efficiency of medical care. One prominent example is the use of machine learning in
predicting chronic diseases such as diabetes. By analyzing patient data—including factors
like age, blood sugar levels, BMI, and insulin levels—machine learning models can
accurately assess the risk of diabetes, enabling early intervention and better disease
management. The integration of machine learning in healthcare not only enhances diagnostic
accuracy but also supports clinicians in making evidence-based decisions, ultimately
contributing to more effective, data-driven healthcare delivery systems. As data availability
and computational power continue to grow, machine learning is expected to play an even
greater role in transforming healthcare and improving patient outcomes.

5
2024-2025

1.4 ROLE OF MACHINE LEARNING ALGORITHMS IN DIABETES PREDICTION:

K-Nearest Neighbors (KNN):


K-Nearest Neighbors (KNN) is a simple yet effective supervised machine learning
algorithm used in diabetes prediction. It works by comparing a new patient’s health data
(such as glucose level, BMI, age, etc.) with the data of existing patients. The algorithm
calculates the similarity between data points using distance measures like Euclidean distance.
Then, it finds the ‘K’ closest data points (neighbors) and assigns the new patient to the class
that is most common among these neighbors—either diabetic or non-diabetic. Since KNN is
a non-parametric and instance-based algorithm, it does not require any prior training, making
it straightforward to implement and useful for small to medium-sized datasets.

SUPPORT VECTOR MACHINE(SVM):


Support Vector Machine (SVM) is a powerful supervised learning algorithm that is highly
effective in binary classification problems like diabetes prediction. SVM works by finding an
optimal hyperplane that best separates the two classes—in this case, diabetic and non-
diabetic individuals. The main goal is to maximize the margin between the classes, ensuring
the model generalizes well to unseen data. For cases where the data is not linearly separable,
SVM uses kernel functions (like polynomial or radial basis function) to transform the data
into a higher-dimensional space where a clear separation is possible. SVM is known for its
high accuracy and robustness, especially in cases with complex and high-dimensional
datasets.

DECISION TREE(DT):
A Decision Tree is a supervised learning algorithm that is especially useful for classification
tasks such as predicting diabetes. It models decisions and their possible consequences in the
form of a tree-like structure. Starting from the root, the algorithm splits the dataset into
branches based on feature values that result in the best separation of classes. Each internal
node represents a test on an attribute (e.g., "Is glucose level > 120?"), and each leaf node
represents the final decision (diabetic or non-diabetic). Decision Trees are easy to interpret
and visualize, making them suitable for medical applications where explainability is
important.
6
2024-2025

2. RELATED WORK

Over the years, numerous studies have explored the use of machine learning techniques for
diabetes prediction, leveraging clinical datasets such as the Pima Indians Diabetes Dataset.
Researchers have applied a variety of algorithms including logistic regression, decision trees,
support vector machines (SVM), k-nearest neighbors (KNN), and ensemble methods like to
classify individuals as diabetic or non-diabetic. In many studies, random forest and SVM
have demonstrated high accuracy and robustness, particularly when combined with proper
data preprocessing and feature selection. Some researchers have also experimented with deep
learning models such as artificial neural networks (ANN) to capture complex relationships
within the data. Additionally, optimization techniques like grid search and cross-validation
have been widely used to fine-tune model parameters and improve performance. Recent
work has also emphasized the importance of addressing imbalanced datasets and missing
values to enhance the reliability of predictions. Overall, the literature indicates that machine
learning offers promising results for diabetes prediction and continues to evolve with the
integration of hybrid models, real-time prediction systems, and explainable AI to support
clinical decision-making.

Studies have also demonstrated that proper data preprocessing—such as normalization,


missing value imputation, and feature selection—plays a critical role in enhancing model
accuracy. Some researchers have employed feature selection methods like Principal
Component Analysis (PCA) or correlation-based selection to reduce dimensionality and
focus on the most informative variables. Deep learning approaches, such as artificial neural
networks (ANN) and convolutional neural networks (CNN), have also been tested, with
varying degrees of success, particularly when combined with large and diverse datasets. In
addition, several works have incorporated cross-validation techniques and hyperparameter
tuning methods, such as grid search and random search, to improve generalization and model
robustness.

7
2024-2025

3. MOTIVATION
Diabetes mellitus has emerged as one of the most pressing global health concerns, affecting
millions of people and leading to severe health complications if not diagnosed and managed
in a timely manner. The chronic nature of the disease, coupled with its often as
symptomatic early stages, makes early detection critical for effective intervention and long-
term management. Traditional diagnostic approaches, such as blood tests and manual
analysis by healthcare professionals, can be time-consuming, expensive, and prone to
human error, especially in areas with limited access to medical expertise and infrastructure.
This scenario underscores the need for intelligent, automated systems that can assist in
predicting the likelihood of diabetes at an early stage using readily available medical data.
The rapid growth of healthcare data, advancements in computational power, and the
evolution of machine learning technologies present a significant opportunity to transform
diabetes diagnosis through predictive analytics. Machine learning algorithms are capable of
identifying complex, non-linear relationships in medical datasets, making them highly
effective for classifying patients based on risk factors. By training predictive models on
historical patient data, we can develop systems that provide fast, cost-effective, and
accurate predictions, thus enabling timely clinical decisions and reducing the burden on
healthcare systems. Additionally, such predictive models can be integrated into mobile
health applications and telemedicine platforms to reach underserved populations.

The motivation for this project stems from the potential impact of machine learning in
bridging the gap between early diagnosis and disease prevention. Implementing an
intelligent system for diabetes prediction can not only enhance patient outcomes but also
contribute to public health strategies aimed at controlling the global rise of diabetes through
data-driven solutions. application is developed to help people who are unintentionally
nonadherent. To avoid the possibility of suffering from unintentional nonadherence, a
mobile application is developed in such a way that it reminds people to take medicine of
correct dosage in time as smart phones has become very common now-a-days.

8
2024-2025

4. OBJECTIVES

The primary objective of this project is to develop a machine learning-based system


capable of accurately predicting the presence of diabetes using patient health data.

The project aims to preprocess the dataset through techniques such as handling missing
values, normalization, and feature selection to ensure high-quality input for model
training. Additionally, the project seeks to analyze the importance of different health
parameters in predicting diabetes and compare model performance using evaluation
metrics such as accuracy, precision, recall, F1-score, and ROC-AUC score. Another key
goal is to create a reliable, interpretable, and cost-effective system that can aid healthcare
professionals in making informed decisions, thereby improving early diagnosis and
patient outcomes.

FEASIBILITY STUDY:

Before initiating the development of the diabetes prediction system using machine learning,
a comprehensive feasibility study was conducted to assess the project's practicality and
ensure its successful implementation. The study evaluated three main aspects: technical,
operational, and economic feasibility.

 TECHNICAL FEASIBILITY

 OPERATIONAL FEASIBILITY

 ECONOMICAL FEASIBILITY

TECHNICAL FEASIBILITY:

Focuses on whether the required technology, tools, and resources are available and suitable
for building the system. In this case, the project is highly feasible from a technical
standpoint. The necessary tools, such as Python, Scikit-learn, Pandas, and Notebook, are
open-source and well-documented. The machine learning models being implemented— such
as Logistic Regression, Decision Tree, and SVM—are well-established and supported by
extensive libraries. Additionally, the Pima Indians Diabetes Dataset is publicly available
9
2024-2025

and suitable for the classification task. Since the system does not require any advanced
hardware or complex infrastructure, development and testing can be done on standard
computing systems. determine whether the proposed system is technically feasible, we
should take into consideration the technical issues involved behind the system. Maintenance
of Elementary School Data uses the web technologies, which is rampantly employed these
days worldwide. The world without the web is incomprehensible today.

OPERATIONAL FEASIBILITY:

Operational feasibility assesses whether the proposed system can be used effectively in a
real-world setting. The system is designed to be user-friendly and can easily be integrated
into healthcare environments. Once developed, it can be used by healthcare professionals
with minimal training to input patient data and receive predictions. The system’s ability to
provide quick and accurate results makes it operationally efficient and useful in medical
decision-making, especially in primary care and remote healthcare settings.

ECONOMIC FEASIBILITY

Economic feasibility examines the cost-effectiveness of the system. The project is highly
economical as it uses free tools and datasets, reducing development and deployment costs.
Moreover, by enabling early detection of diabetes, the system can help reduce long-term
healthcare costs for patients and medical institutions. The potential benefits, including early
intervention, reduced hospitalization, and improved health outcomes, make the system a
valuable and cost-effective solution for both public and private healthcare providers.

In conclusion, the feasibility study confirms that the project is viable and practical from
technical, operational, and economic perspectives. It supports the development of a reliable,
efficient, and affordable diabetes prediction system that can have a significant impact on
health care delivery.

10
5. PROBLEM STATEMENT

5.1 Existing System:


The existing system for diabetes diagnosis primarily relies on conventional medical
procedures, such as fasting blood sugar tests, HbA1c tests, and oral glucose tolerance tests,
which are conducted in clinical settings. These methods, while medically accurate, often
require significant time, laboratory infrastructure, and skilled professionals to interpret results.
In many cases, especially in rural or resource-constrained regions, access to timely diagnostic
services is limited, leading to late detection and increased risk of complications. Furthermore,
traditional systems are reactive in nature identifying diabetes only after symptoms appear or
during routine check-ups—rather than being proactive in predicting the risk of developing the
disease. Although some hospitals and healthcare providers use electronic health record (EHR)
systems to store patient data, these systems are largely administrative and do not provide
predictive capabilities. Some early-stage expert systems or rule-based decision support tools
have been attempted for disease detection, but they lack adaptability, scalability, and accuracy
compared to modern machine learning approaches. As a result, the current systems fall short in
leveraging the vast amount of healthcare data available for predictive analysis, highlighting the
need for a more intelligent and data-driven solution that can assist in the early detection and
prevention of diabetes

5.2 Proposed System:

The proposed system aims to utilize machine learning techniques to build an intelligent and
automated solution for predicting the risk of diabetes based on patient health data. Unlike
traditional diagnostic methods that require manual interpretation and lab testing, this system
leverages historical data and trained models to provide fast and accurate predictions. The
system takes key input features such as glucose levels, BMI, age, insulin, blood pressure, and
others, and processes them through various machine learning algorithms like Decision Tree,
KNN (k-nearest neighbors) and Support Vector Machine (SVM). These models are trained on
well-known datasets, such as the Pima Indians Diabetes Dataset, to learn patterns associated
with diabetic and non-diabetic patients. The final model is selected based on performance
metrics such as accuracy, precision, recall, and F1-score. The system also includes steps to
handle missing data, normalize values, and select important features for optimal prediction.
11
2024-2025

6. DESIGNMETHODOLOGY

6.1 SYSTEM ARCHITECTURE:

The system architecture for the diabetes prediction model is structured into five key layers:
data acquisition, preprocessing, model training, prediction, and user interface. First,
healthcare data (e.g., from the Pima Indians Diabetes Dataset) is collected. The
preprocessing layer cleans and normalizes the data, handling missing values and selecting
important features. The processed data is then used in the model training layer, where
various machine learning algorithms like Decision tree and k-nearest neighbors, Super
visied machine learning are applied. Once trained, the model moves to the prediction layer
to evaluate new patient data. Finally, a user-friendly interface allows users to input health
metrics and receive instant diabetes predictions. This architecture ensures accurate,
scalable, and real-time predictions. architecture consists of all the modules of the main
module (Utility Shack.exe) are shown below.

Figure 6.1: System analysis architecture

12
2024-2025

1. Start: The process begins when the system is initiated. This step marks the start of
the diabetes prediction process.
2. Input Patient Data: In this step, the system collects relevant health data from the user.
This typically includes key attributes such as blood glucose levels, BMI (Body Mass Index),
age, insulin, blood pressure, and other health metrics.
3. Data Preprocessing: Once the data is received, it undergoes preprocessing to
ensure that it is clean and ready for the model. This step involves handling any missing
values, normalizing or scaling the data to a standard range, and selecting the most
important features that contribute to predicting diabetes. Proper preprocessing ensures
the model performs effectively.
4. Load Trained ML Model: The preprocess data is then passed to a trained machine
learning model, such as K-Nearest Neighbors, Support Vector Machine (SVM), or
Decision tree, that has been trained on historical patient data. This step is where the
system uses the previously trained model to make predictions based on the new input
data.
5. Predict Diabetes Status: The trained model processes the input data and generates
Prediction whether the patient is likely to have diabetes or not. The model uses the
learned During training to classify the new data.
6. Display Result: The prediction is displayed to the user, informing them of the
likelihood of being diabetic. This result can help healthcare providers in making
informed decisions about further tests or treatment.
7. End: The process concludes after displaying the prediction.

13
2024-2025

Figure 6.2 Technical architecture of proposed system

The technical architecture of the diabetes prediction system includes data collection,
preprocessing, model training (using KNN, SVM, and Decision Tree), and evaluation. The best
model is deployed using Flask, enabling real-time predictions via a web interface. A backend
server processes user inputs, runs predictions, and returns results, ensuring accurate and efficient
diagnosis support.

14
2024-2025

Source Code:( diabetes predict.ipynb)


!pip install pandas numpy matplotlib seaborn scikit-learn joblib
from google.colab import files
uploaded = files.upload()
import pandas as pd
file_name = list(uploaded.keys())[0]
df = pd.read_csv(file_name)
df.head()
# Preprocessing
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Handle missing values
df.fillna(df.mean(), inplace=True)
# Split features and target
X = df.drop(columns=["Outcome"])
y = df["Outcome"]
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
svm_model = SVC(kernel='linear')
svm_model.fit(X_train_scaled, y_train)
svm_pred = svm_model.predict(X_test_scaled)
knn_model = KNeighborsClassifier(n_neighbors=5)
knn_model.fit(X_train_scaled, y_train)

15
2024-2025
knn_pred = knn_model.predict(X_test_scaled)
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train_scaled, y_train)
dt_pred = dt_model.predict(X_test_scaled)
svm_acc = accuracy_score(y_test, svm_pred) * 100
knn_acc = accuracy_score(y_test, knn_pred) * 100
dt acc = accuracy_score (y_test, dt_pred) * 100
print (f"SVM Accuracy: {svm_acc:.2f}%")
print (f"KNN Accuracy: {knn_acc:.2f}%")
print(f"Decision Tree Accuracy: {dt_acc:.2f}%")
print("\n--- Classification Report: SVM ---")
print(classification_report(y_test, svm_pred))
print("\n--- Classification Report: KNN ---")
print(classification_report(y_test, knn_pred))
print("\n--- Classification Report: Decision Tree ---")
print(classification_report(y_test, dt_pred))
import seaborn as sns
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 4))
plt.subplot(1, 3, 1)
sns.heatmap(confusion_matrix(y_test, svm_pred), annot=True, fmt="d", cmap="Blues")
plt.title("SVM Confusion Matrix")
plt.subplot(1, 3, 2)
sns.heatmap(confusion_matrix(y_test, knn_pred), annot=True, fmt="d", cmap="Greens")
plt.title("KNN Confusion Matrix")
plt.subplot(1, 3, 3)
sns.heatmap(confusion_matrix(y_test, dt_pred), annot=True, fmt="d", cmap="Oranges")
plt.title("Decision Tree Confusion Matrix")
plt.tight_layout()
plt.show()
# Accuracy Bar Graph
models = ["SVM", "KNN", "Decision Tree"]
accuracy = [svm_acc, knn_acc, dt_acc]
16
2024-2025
plt.figure(figsize=(8, 5))
plt.bar(models, accuracy, color=["blue", "green", "orange"])
plt.xlabel("ML Algorithms")
plt.ylabel("Accuracy (%)")
plt.title("Comparison of ML Models for Diabetes Prediction")
plt.ylim(0, 100)
plt.show()
import matplotlib.pyplot as plt
# Accuracy values in percentage
knn_accuracy = 82.5
svm_accuracy = 85.0
dt_accuracy = 80.2
models = ['KNN', 'SVM', 'Decision Tree']
accuracies = [knn_accuracy, svm_accuracy, dt_accuracy]
colors = ['skyblue', 'lightgreen', 'salmon']
plt.figure(figsize=(10, 6))
bars = plt.bar(models, accuracies, color=colors)
# Add accuracy values on top of bars
for bar, acc in zip(bars, accuracies):
plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() - 2,
f'{acc:.1f}%', ha='center', va='bottom', fontsize=12
# Graph details
plt.title('Accuracy Comparison of ML Algorithms for Diabetes Prediction', fontsize=14)
plt.xlabel('ML Algorithms', fontsize=12)
plt.ylabel('Accuracy (%)', fontsize=12)
plt.ylim(0, 100)
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.tight_layout()
plt.show()

17
2024-2025
6.2 MODULES:

Backend(app.py):
# app.py
from flask import Flask, render_template, request, redirect, url_for, session
import joblib
import numpy as np
app = Flask( name )
app.secret_key = 'diabetes_secret'
# Load models
knn = joblib.load("models/knn_model.pkl")
svm = joblib.load("models/svm_model.pkl")
dt = joblib.load("models/dt_model.pkl")
scaler = joblib.load("models/scaler.pkl")
# Dummy login credentials
users = {'admin': 'admin123'}
@app.route('/')
def home():
return render_template("home.html")
@app.route('/login', methods=['GET', 'POST'])
def login():
if request.method == 'POST':
uname = request.form['username']
pwd = request.form['password']
if uname in users and users[uname] == pwd:
session['user'] = uname
return redirect(url_for('predict'))
else:
return render_template('login.html', error="Invalid Credentials")
return render_template('login.html')
@app.route('/predict', methods=['GET', 'POST'])
def predict():
if 'user' not in session:

18
2024-2025

return redirect(url_for('login'))
if request.method == 'POST':
input_features = [float(x) for x in request.form.values() if x != request.form['model']]
input_scaled = scaler.transform([input_features])
model_type = request.form['model']
if model_type == 'KNN':
pred = knn.predict(input_scaled)[0]
elif model_type == 'SVM':
pred = svm.predict(input_scaled)[0]
elif model_type == 'DT':
pred = dt.predict(input_scaled)[0]
else:
pred = None
return render_template('result.html', prediction=pred)
return render_template('predict.html')
@app.route('/logout')
def logout():
session.pop('user', None)
return redirect(url_for('home'))
if name == ' main ':
app.run(debug=True)

19
2024-2025

6.2.1 Frontend :Home Page

The Home Page of the Diabetes Prediction Website serves as the welcoming interface and
the central entry point for users. It introduces the purpose of the application — providing a
convenient and accessible platform for predicting diabetes risk using machine learning
models. The design is clean and user-friendly, typically featuring a welcoming message and
a prominent call-to-action button (such as "Login to Predict") that directs users to the login
page.

Abstract Source Code:


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Diabetes Prediction - Home</title>
</head>
<body>
<div class="container">
<h2>Welcome to Diabetes Prediction</h2>
<p>Use machine learning to predict the chances of diabetes quickly and accurately.</p>
<a href="/login" class="button">Login to Predict</a>
</div>
</body>
</html>

20
2024-2025

6.2.2 Login Page:

The Login Page of the Diabetes Prediction Website is a secure entry point designed to
authenticate users before granting access to prediction and analysis features. It includes
fields for entering a username and password, ensuring that only authorized users can use
the system's functionalities. This adds a layer of privacy and security, especially when
dealing with sensitive health-related information. The page is designed to be simple and
intuitive, with clear labels and a clean layout. Upon submitting valid credentials, users
are redirected to the prediction page.

Abstract Source Code:


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Login - Diabetes Prediction</title>
</head>
<body>
<div class="login-container">
<h2>Login</h2>
<form method="post">
<input name="username" placeholder="Username" required>
<input name="password" type="password" placeholder="Password" required>
<button type="submit">Login</button>
</form>
</div>
</body>
</html>

21
2024-2025

6.2.3 Prediction Page:

The Prediction Page is the core functionality of the Diabetes Prediction Website,
allowing users to input specific patient health data and receive a diabetes risk prediction.
The form on this page includes important medical attributes such as Pregnancies, Glucose
level, Blood Pressure, Skin Thickness, Insulin level, BMI (Body Mass Index), Diabetes
Pedigree Function, and Age. These are the features used by machine learning models to
assess whether a person is likely to have diabetes Additionally, the user can select a
preferred prediction algorithm — KNN (K-Nearest Neighbours), SVM (Support Vector
Machine), or Decision Tree — from a dropdown menu. After entering all the details and
selecting the model, the user submits the form. The system then scales the input data
using a trained scaler and runs the prediction using the chosen model.

Abstract Source Code:


<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Predict - Diabetes Prediction</title>
</head>
<body>
<div class="form-container">
<h2>Enter Patient Data</h2>
<form method="post">
<input name="Pregnancies" placeholder="Pregnancies" required>
<input name="Glucose" placeholder="Glucose" required>
<input name="Blood Pressure" placeholder="Blood Pressure" required>
<input name="Skin Thickness" placeholder="Skin Thickness" required>
<input name="Insulin" placeholder="Insulin" required>
<input name="BMI" placeholder="BMI" required>
<input name="Diabetes Pedigree Function" placeholder="Diabetes Pedigree Function" required>
<input name="Age" placeholder="Age" required>
<select name="model" required>
<option value="" disabled selected>Select Model</option>
<option value="KNN">KNN</option>
<option value="SVM">SVM</option>
<option value="DT">Decision Tree</option>
</select>
<button type="submit">Predict</button>

22
2024-2025

</form>
<a href="/logout">Logout</a>
</div>
</body>
</html>

6.2.4 Result Page:

The Result Page displays the outcome of the diabetes prediction based on the data
provided by the user. After processing the input through the selected machine learning
model, the result is shown clearly—indicating whether the patient is likely or unlikely to
have diabetes. The result is presented in a visually distinct way using colors (e.g., red for
positive, green for negative), making it easy to understand. This page also includes an
option to go back and make another prediction, ensuring smooth navigation and
usability. It plays a crucial role in giving immediate and clear feedback to the user.

Abstract Source Code:

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Prediction Result</title>
<style>
</head>
<body>
<div class="result-box">
<h2>Prediction Result</h2>
{% if prediction == 1 %}
<p class="positive"> · ı. The patient is likely to have diabetes.</p>
{% else %}
<p class="neg a t i v e " > ■
T he patient is unlikely to have diabetes.</p>
{% endif %}
<a href="/predict" class="btn">Predict Again</a>
</div>
</body>
</html>

23
2024-2025

6.3 Requirement Specifications:

6.3.1 Minimum Software Requirements


 Operating systems : WINDOWS 10
 IDE : GOOGLE COLAB, VS CODE
 Machine Learning : Machine Learning algorithms
 Python : Flask, scikit-learn, pandas, numpy, tensor flow, job lib

6.3.2 Minimum Hardware Requirements


Processor : Intel i3
RAM Capacity : 8GB(or)Higher
Cache : 512KB
Hard Disk : 5GB

24
2024-2025

6.4 UML Diagrams:

Fig:6.3 Use Case Diagram for Diabetes Prediction

A use case diagram visually represents the interaction between users (or external systems)
and the various functionalities of your proposed diabetes prediction system. It typically
consists of actors (users like patients, doctors, and administrators) and use cases (specific
actions they perform).
In this system, key actors may include:
Patients: They input medical data and receive predictive results.
Doctors: They access detailed reports and provide medical insights.
Admin: Manages data and system configurations.
The diagram highlights interactions such as data entry, model
prediction, report generation, and feedback submission. These use cases are connected via
associations to relevant actors, showing how different components of the system function
together.

25
2024-2025

6.4.1 Class diagram for Diabetes Prediction

Figure 6.4: Class diagram for diabetes prediction

The class diagram for diabetes prediction using KNN, Decision Tree, and SVM represents
the key components and their relationships in the system. It includes Patient Data, which
holds attributes like glucose levels, blood pressure, BMI, and insulin levels. The
Preprocessing Module handles data cleaning and normalization before feeding it into the
Machine Learning Models. The system consists of three classifiers—KNN Classifier,
Decision Tree Classifier, and SVM Classifier each implementing respective prediction logic.
These classifiers interact with the Evaluation Module. The Prediction Interface serves as the
user-facing component, allowing input of patient data and returning diabetes risk predictions.
Finally, the Flask-based Web Application integrates all components, ensuring smooth real-
time interactions between users and the trained models.
26
2024-2025

6.4.2 Activity diagram for Diabetes Prediction:

Figure 6.4: Activity diagram for diabetes prediction

27
2024-2025

The activity diagram for Diabetes Prediction Using Machine Learning Techniques outlines
the step-by-step flow of the system. It begins with the user inputting patient data, followed
by data preprocessing steps like cleaning and splitting the dataset. Next, machine learning
models are trained and evaluated based on performance metrics. The best-performing
model is then used to make predictions on new input data. Finally, the system displays
whether the patient is diabetic or non-diabetic, completing the process. This diagram
visually represents the logical and sequential flow of the prediction system.

For diabetes prediction using KNN, SVM, and Decision Tree outlines the structural
components of the system and their relationships. The Patient Data class stores attributes
such as glucose levels, blood pressure, BMI, and insulin readings, which are essential for
prediction. The Preprocessing Module ensures the data is cleaned, normalized, and
prepared before feeding it into the classifiers. The core of the system consists of three
Machine Learning Model Classes—KNN Classifier, SVM Classifier, and Decision Tree
Classifier—each responsible for applying its respective algorithm to determine diabetes
risk based on patient input.

28
2024-2025

6.4.3 Sequence Diagram for Diabetes Prediction:

6.5 Sequence diagram for Diabetes Prediction

The sequence diagram for Diabetes Prediction Using Machine Learning Techniques
illustrates the interaction between different components of the system—namely the User,
System, and Machine Learning Model. The process begins with the user providing input
data, such as medical parameters. This data is received by the system, which then processes
and formats it appropriately before passing it to the trained machine learning model. The
model analyzes the input and returns a prediction indicating whether the individual is
diabetic or not. Finally, the system communicates this result back to the user. This
sequence highlights how each component works together in real-time to deliver a
predictive diagnosis efficiently.

29
2024-2025

6.5.1 State Chart for Diabetes Prediction:

Figure 6.6: State chart for diagram for Diabetes Prediction

The state chart diagram for Diabetes Prediction Using Machine Learning Techniques
represents the different states the system transitions through during its operation. It begins
with the initial state where the system is idle or waiting for input. Once the user enters
patient data, the system transitions to the data preprocessing state, where it cleans and
prepares the data. From there, it moves to the model training or loading state, depending
on whether a new model is being trained or a pre-trained model is being used. After the
model is ready, the system enters the prediction state, where the input data is analyzed to
generate a result. Finally, it reaches the output state, where the prediction—whether the
patient is diabetic or non-diabetic—is displayed to the user. The process ends with the
system returning to the idle state, ready for new input. This diagram effectively shows
how the system's internal states change throughout the prediction process

30
2024-2025

7. Experimental Studies:
7.1 Testing Process:
Unit Testing:
Unit testing involves validating individual components of the system in isolation to ensure
they perform as expected. In our project, unit testing was applied in the model training
phase using Google Colab. We individually tested each machine learning algorithm—
KNN, SVM, Decision Tree, to confirm that they function correctly, can fit the training
data, and generate predictions on unseen data. This early testing helped isolate issues like
incorrect preprocessing or model configuration before moving on to integration testing
involves testing individual components or functions of the system in isolation.

Accuracy Testing:
Accuracy testing is crucial for evaluating how well our machine learning models perform
in predicting diabetes. In Colab, we split the dataset into training and testing sets, then
calculated accuracy, precision, recall, and F1-score for each model. This allowed us to
compare the performance of models like KNN, SVM, Decision Tree, and CNN. The
accuracy scores were later visualized on the website through a Model Accuracy
Comparison page, helping users understand which algorithm performs best.

Integration Testing:
Integration testing ensures that various components of the application work together as a
complete system. In the web application, we tested whether the model inputs collected via
forms were correctly passed to the prediction functions, whether the prediction results
were processed properly, and whether navigation between pages (like from login to
predict) worked seamlessly. It helped ensure the Flask routes, templates, and backend logic
functioned as an integrated unit.

31
2024-2025

Functional Testing:
Functional testing verifies that each part of the application performs its intended function.
In this project, we conducted functional tests on the login system, prediction submission
form, model selection dropdown, and logout functionality. We ensured that all inputs are
accepted correctly, proper predictions are returned, and each button and link routes users
to the right page. This testing guaranteed that the application behaves as expected from a
user's perspective.

Usability Testing:
Usability testing evaluates how easy and intuitive the application is for end-users. We
assessed the layout, color scheme, and navigation flow of the website to ensure it is
visually appealing and simple to use. Feedback was considered to improve elements like
form structure, label clarity, and button positions. The consistent design and responsive UI,
tested on both desktops and mobile devices, made the application accessible and user-
friendly.

Manual Testing
Manual testing involves using the application like a regular user to find bugs or usability
issues that automated tests might miss. In this project, we manually tested all features
including login with correct and incorrect credentials, entering valid and invalid patient
data, and navigating through all pages like Home, Predict, Results, Graphs, Suggestions,
and Reports. It helped identify issues in real-time and validated that all functionalities are
working properly before deployment

32
2024-2025

7.1.2 Running Procedure:

The website application for diabetes prediction runs as follows:

Step 1: The user accesses the diabetes prediction website through a browser

Step 2: The user logs in with their credentials or registers for a new account if they are a new
user.

Step 3: After successful login, the user clicks on the “Diabetes Prediction” section/page
from the navigation menu.

Step4: Input Health Details: Glucose level, Blood Pressure, Insulin, BMI, Age, Skin Thickness,
Pregnancies etc.

Step 5: The user selects the desired machine learning model (KNN, SVM, or Decision Tree)
from a dropdown or radio button option.

Step 6: The user clicks the “Predict” or “Submit” button to send the data to the backend server

Step 7: The backend performs the following:


• Preprocesses the input data.
• Loads the selected ML model.
• Makes a prediction (Diabetic or Non-Diabetic).

Step 8: The result is displayed on the screen indicating whether the user is Diabetic or
Non-Diabetic

33
2024-2025

7.2 Test Cases:

.no Test input Expected Actual status


name output
output
1 Home Home page Home Success
Home page should load
Page page
successfully without errors loaded
Load loaded
Successfully success
without
errors
2 Need to validate Successfully Success
Login Username, Password
credentials from the validated in
Test
database the database

3 Prediction Correct Correct


Test Predict diabetes based on predict
prediction Success
user display
displayed
input features

4 Result It gives the output of the Likely to have Diabetic/ Success


Test prediction Not
diabetes
Diabetic
/unlikely to
have diabetes

5 Logout successful, Logout Success


Logout User logged out and
redirected to login page redirected to login successful,
Test
page redirected to
login page

Table 7.1 Test cases for diabetes prediction

34
2024-2025

7.3 RESULT ANALYSIS:

Figure 7.1 Comparison of Machine Learning Models

The bar graph illustrates the comparison of accuracy percentages among three machine
learning models—SVM (Support Vector Machine), KNN (K-Nearest Neighbors), and
Decision Tree—used for diabetes prediction. The y-axis represents the accuracy in
percentage, while the x-axis lists the machine learning algorithms. From the graph, it is
evident that the SVM model outperforms the others with the highest accuracy of around
76%, followed by the Decision Tree with an accuracy slightly lower, and KNN showing
the lowest performance, approximately 70%. This graphical analysis helps in selecting the
most reliable model for deployment based on predictive performance, where SVM stands
out as the most effective algorithm among the three tested.

35
2024-2025

Figure 7.2 Confusion matrix of Machine Learning Models

The given image displays the confusion matrices for three machine learning models—
SVM, KNN, and Decision Tree—used in the diabetes prediction project. Each matrix
provides a detailed summary of prediction results, comparing actual outcomes (rows) with
predicted outcomes (columns). The value at the top-left of each matrix shows the true
negatives (patients correctly identified as non-diabetic), while the bottom-right shows true
positives (patients correctly identified as diabetic).
SVM Confusion Matrix: Shows 81 true negatives and 36 true positives, indicating good
performance with relatively fewer false positives (18) and false negatives (19).
KNN Confusion Matrix: Has 79 true negatives and 28 true positives, but more errors with
20 false positives and 27 false negatives, reflecting lower accuracy.
Decision Tree Confusion Matrix: Reports 75 true negatives and the highest number of true
positives (40), with fewer false negatives (15) but more false positives (24) compared to
SVM.
This comparative analysis helps understand how each model handles misclassifications,
with SVM providing a balanced performance, while the Decision Tree factors detecting
positives (diabetes cases), and KNN shows relatively weaker prediction strength overall.

36
2024-2025

Figure 7.3 Accuracy Comparison of Machine Learning Models

The given bar graph illustrates the accuracy comparison of three machine learning
algorithms—KNN, SVM, and Decision Tree—used for diabetes prediction. The vertical
axis represents accuracy in percentage, while the horizontal axis lists the three models.
From the graph, it's evident that SVM (Support Vector Machine) performs the best with
the highest accuracy of 85.0%, followed by KNN (K-Nearest Neighbours) at 82.5%, and
Decision Tree at 80.2%.
This comparison clearly shows that SVM is the most reliable model for this dataset in
terms of predictive accuracy, making it a preferred choice for deployment in a real-time
diabetes prediction system. The visual representation effectively communicates the
performance differences among the algorithms, supporting data-driven decision-making in
model selection.

37
2024-2025

 Test case for Home Page:

Fig:7.4 Test case for home page

38
2024-2025
 Testcase for Login Page:

FIELDS INPUT RESULT

POSITIVE CASE Valid username and password Redirects to home page

NEGATIVECASE Invalid username and password Doesn’t open the homepage

Table7.2 Test case for login page

Figure 7.5: Loginpage

39
2024-2025
 Testcase for Prediction Result:

FIELDS INPUT RESULT

POSITIVE CASE Patient data Redirects to the Prediction page

NEGATIVE CASE Patient data Doesn’t open the Prediction Page


and gives a blank page.

Table 7.3 Test case for prediction result

Figure 7.6 Patient Data

40
2024-2025
 Testcase for Prediction Result:

FIELDS INPUT RESULT

POSITIVE CASE Age, Glucose, BMI Patient likely has diabetes

NEGATIVE CASE Age, Glucose, BMI Patient likely does not have
diabetes

Table 7.4 Test case for Prediction Result

Fig:7.7 Prediction Result

41
8. CONCLUSION AND FUTURE WORK:
In conclusion, machine learning offers a powerful approach to diabetes prediction by
leveraging patient health data to make informed classifications. Through methods such as
Decision Tree,(SVM)-Support Vector Machine, (KNN) k-nearest neighbor models can
analyze key features like glucose levels, BMI, and blood pressure to assess diabetes risk.
The integration of graphical representations, including confusion matrices and ROC curves,
enhances the interpretability of predictions, allowing for better evaluation of model
performance. While current ML techniques provide promising results, further improvements
can be achieved through hyperparameter tuning, feature engineering, and ensemble
methods. Deploying these models via web applications ensures accessibility and practical
use in real- world healthcare scenarios, contributing to early diagnosis and preventive care
strategies. Future developments could integrate deep learning for higher accuracy.

Future work in diabetes prediction using machine learning can focus on improving
accuracy, interpretability, and real-world applicability. One potential direction is integrating
deep learning models, such as Convolutional Neural Networks (CNNs) or Recurrent Neural
Networks (RNNs), to capture more complex patterns in medical data. Expanding datasets to
include diverse patient demographics and real-time data from wearable devices can further
improve model generalization. Deploying the model as a user-friendly web application with
continuous updates will enhance accessibility for healthcare providers.Lastly, ensuring
robust model validation through software testing methods, including unit testing, functional
testing, and black-box testing, will contribute to the reliability of ML-based diabetes
prediction systems.

42
References:

[1] R. Sharma and A. Verma, "Improved Diabetes Prediction Using Ensemble


Learning Models," J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 1, pp. 55–
63, Jan. 2023.
[2] M. Singh and N. Patel, "Performance Analysis of Machine Learning Algorithms
for Diabetes Prediction," in Proc. IEEE Int. Conf. Smart Technol. Smart Nation
(Smart Tech Con), 2022, pp. 325–330.
[3] A. Kumar and M. Thomas, "A Deep Learning Based Approach for Early
Diabetes Detection," Int. J. Healthcare Inf. Syst. Inform., vol. 17, no. 3, pp. 45–
58, 2022.
[4] R. Ali and H. Khan, "A Hybrid Approach Using SVM and KNN for Diabetes
Prediction," Procedia Comput. Sci., vol. 195, pp. 348–355, 2021.
[5] M. Z. Uddin, M. A. Hossain, and M. R. A. Bhuiyan, "Comparing Different
Supervised Machine Learning Algorithms for Disease Prediction," Comput.
Methods Programs Biomed., vol. 177, pp. 161–172, 2020.
[6] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I.
Chouvarda, "Machine Learning and Data Mining Methods in Diabetes
Research," Comput. Struct. Biotechnol. J., vol. 15, pp. 104–116, 2017.
[7] T. Santhanam and M. S. Padmavathi, "Application of K-means and Genetic
Algorithm for Diabetes Diagnosis," Procedia Comput. Sci., vol. 47, pp. 76–83,
2015.
[8] J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. Scott,
"Using the ADAP Learning Algorithm to Forecast the Onset of Diabetes
Mellitus," in Proc. Annu. Symp. Comput. Appl. Med. Care, 1988, pp. 261–265.
[9] L. Breiman, "Random Forests," Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001.
[10] C. Cortes and V. Vapnik, "Support-vector networks," Mach. Learn., vol. 20, no.
3, pp. 273–297, 1995.

43
2024-2025

You might also like