Knowledge Discovery in Information Technology and Communication Engineering
Prediction Of Diabetes Using Machine Learning
Manjushree.M1,Goutham.N2
1
Manjushree.M, Bangalore-560054,India
2
Goutham.N, Bangalore-560032,India
Abstract
Diabetes mellitus is a common disease of human body caused by a group of metabolic disorders where the sugar levels over a
prolonged period is very high. It affects different organs of the human body which thus harm a large number of the body's
system, in particular the blood veins and nerves. Early prophecy in such disease can be inhibited and save individual life. To
accomplish the objective, this research work mainly explores various risk factors related to this disease using machine learning
techniques. Efficient results are provided by Machine learning techniques to extract knowledge by constructing predicting
models from diagnostic medical datasets collected from the diabetic patients. Extracting acquaintance from such data can be
helpful to predict diabetic patients. In this work, we employ four popular machine learning algorithms, namely Support Vector
Machine (SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN) and C4.5 Decision Tree (DT) and about the Existing
Technologies and the Proposed Architecture, on adult population data to predict diabetic mellitus. Our experimental results show
that C4.5 decision tree achieved superior precision compared to other machine learning techniques.
Keywords— eHealth; diabetes; machine learning; prediction;
1.INTRODUCTION
Diabetes mellitus could be a sickness that affects the secretion hypoglycaemic agent, leading to abnormal
metabolism of carbohydrates and improves levels of sugar within the blood. This high glucose affects numerous
organs of the frame that successively complicates several causes of the body, particularly the blood veins and
nerves. The characteristic of polygenic disorder is that the glucose is above the conventional level, that is caused
by defective hypoglycaemic agent secretion. it's presently trusty that DM is particularly involved the ageing
method. polygenic disorder will cause chronic harm and pathology of varied tissues, particularly eyes, kidneys,
heart, blood vessels and nerves. polygenic disorder may be divided into 2 classes, kind one polygenic disorder
(T1D) and sort two polygenic disorder (T2D). Patients with kind one polygenic disorder area unit ordinarily
younger, largely but thirty years recent. the everyday clinical symptoms area unit augmented thirst and frequent
micturition, high glucose levels. this kind of polygenic disorder cannot be cured effectively with oral
medications alone and therefore the patient’s area unit needed hypoglycaemic agent medical care. kind two
polygenic disorder happens additional unremarkably in old and aged individuals, that is commonly related to the
incidence of fatness, high blood pressure, dyslipidemia, arterial sclerosis, and alternative diseases. Diabetic
patient’s area unit most likely additional weak against a hoisted risk of micro-vascular harm, during this
approach future complication effects of cardio-vascular malady is that the leading explanation for death.
Early prediction of such malady may be controlled over the diseases and save human life. to attain this goal, this
analysis work primarily explores the first prediction of diabetes polygenic by taking under consideration
numerous risk factors associated with this disease. For the aim of the study we tend to collect diagnostic dataset
having sixteen attributes diabetic of two hundred patients. These attributes area unit age, diet, hyper-tension,
drawback in vision, genetic etc. In later half, we tend to discuss regarding these attributes with their
corresponding values. supported these attributes, we tend to build prediction model mistreatment numerous
machine learning techniques to predict DM. the sooner diagnosing is obtained, the abundant easier we are able to
management it. Machine learning could be a subfield of computing which will facilitate individuals create a
preliminary judgment regarding DM in keeping with their daily physical examination knowledge, and it will
function a reference for doctors. For machine learning technique, the way to choose the valid options and
therefore the correct classifier area unit the foremost necessary issues. Machine learning techniques give
economical result to extract information by constructing predicting models from diagnostic medical datasets
collected from the diabetic patients. Extracting information from such knowledge may be helpful to predict
diabetic patients. numerous machine learning techniques have the flexibility to predict DM. but it's terribly tough
to settle on the simplest technique to predict supported such attributes. therefore for the aim of the study, we tend
to use four widespread machine learning algorithms, specifically Support Vector Machine (SVM), Naive Bayes
(NB), K-Nearest Neighbor (KNN) and C4.5 call tree (DT), on adult population knowledge to predict diabetic
mellitus.The contributions during this study area unit as below We collect real diagnostic dataset having
numerous attributes or risk factors of DM of two hundred patients from a Centre. We create performance
comparison of various machine learning techniques and measure the prediction results supported the relevant
risk factors.
2.PROPOSED ARCHITECTURE
Supervised Learning algorithms learns the pattern from pre-existing knowledge and check out to predict
new result supported the previous learning. mil algorithms area unit accustomed determine existing knowledge like
probability-based, function-based, rule-based, tree-based, instance-based, etc. Following Fig.1. indicates the ample
design of our projected Model. in keeping with our model patient’s area unit needed to supply their medical
knowledge for winning diagnosing of their polygenic disorder check.
Fig. 1: Flow chart of polygenic disorder prediction Model
In this work, we tend to analyse real diagnostic medical knowledge supported numerous risk factors mistreatment
widespread machine learning classification techniques to gauge their performance for predicting DM.
3. MACHINE LEARNING TECHNIQUES
We use these four widespread machine learning classification techniques to predict DM, once the
information has been prepared for modelling. therefore we tend to provide an outline of those techniques. Support
Vector Machines: this is often one in every of the foremost widespread classification technique projected by J. Platt
et. al.
3.1 A Support Vector Machine (SVM):
A Support Vector Machine (SVM) could be a excludent classifier, formally characterised the information
by separating a hyperplane. SVM isolates entities in nominative categories. It also can determine and classify
instances that isn't supported by knowledge. SVM isn't caring within the distribution of acquiring knowledge of
every category. The one extension of this rule is to execute multivariate analysis to provide a linear operate and
another extension is learning to rank components to provide classification for individual components.
3.2 .Naive Bayes:
Naive Bayes could be a widespread probabilistic classification technique projected by John et. al. . Naive
Bayes additionally referred to as Bayesian theorem could be a straightforward, effective and unremarkably used
machine learning classifier. The rule calculates probabilistic results by count the frequency and combines the worth
given in knowledge set. By mistreatment Bayesian theorem, it assumes that each one attributes area unit freelance
and supported variable values of classes. In universe application, the conditional independence assumption seldom
holds true and offers well and additional sophisticate classifier results.
3.3 K-Nearest Neighbor Algorithm:
K-nearest Neighbor is easy classification and regression rule that used non constant technique projected by
Aha et. al.. The rule records all valid attributes and classifies new attributes supported their alikeness live. to see the
gap from purpose of interest to points in coaching knowledge set it uses tree like organisation. The attribute is
classed by its neighbors. in an exceedingly classification technique, the worth of k is usually a positive number of
nearest neighbor. The closest neighbors area unit chosen from a group of sophistication or object property price.
3.4 Decision Tree:
A choice tree could be a tree that gives powerful classification techniques to predict DM. the bulk of the
knowledge highlights restricted distinct areas and have referred to as the "classification". each distinct space and
have of the domain is termed a category. Associate in Nursing input feature of the category attribute is labelled with
the inner node in an exceedingly call tree. The leaf node of the tree is labelled by attribute and every attribute related
to a target price. the best info gain for all the attribute is calculated in every node of the tree.
Fig. 2: An overview of the overall process
4.EXPERIMENTAL RESULTS
In order to guage the performance of various machine learning techniques, we've got shown the prediction
ends up in Figure two on the premise of exactness, recall and f-measure. The figure shows the results of varied
machine learning techniques like SVM, NB, KNN and C4.5. If we tend to observe Figure two, we tend to see that
classifier C4.5 achieves higher results than alternative classifiers to predict DM. in keeping with Figure two, C4.5
achieves seventy two exactness, seventy four recall and seventy two f-measure on this dataset, that is bigger than
alternative learning techniques. This experimental result provides Associate in Nursing proof that call tree performs
well on medical datasets for the aim of predicting DM supported numerous risk factors, discussed within the earlier
section.
Fig. 3: Predictions Results of varied Machine Learning Techniques
In addition to exactness, recall and f-measure, we tend to additionally calculate the direct accuracy rate in proportion
of of these classifiers shown in Figure three. If we tend to observe Figure three, we tend to additionally see that C4.5
call tree technique outperforms than alternative techniques to predict DM.
Fig. 4: Accuracy Results of varied Machine Learning Techniques
Overall, we've got chosen the simplest machine learning technique to predict DM to attain high performance,
supported the analysis criteria discuss higher than. All the techniques mentioned over area unit calculable on
Associate in Nursing unseen testing diabetic dataset. The technique that accomplishes the best performance in terms
of exactness, recall, f-measure and accuracy, is taken into account to be the simplest alternative. supported Figure
two and Figure three, it may be discovered that C4.5 call tree achieved the higher accuracy of 73.5 percent to predict
DM utilizing a given medical dataset.
5. EXISTING TECHNOLOGIES
5.1. Self-Monitoring of glucose Devices:
This device is abbreviated as SMBG. SMBG refers to home glucose testing for individuals with polygenic disorder.
Self watching is that the use of standard blood testing to grasp one’s polygenic disorder management and inform
changes to enhance one’s management . the employment of SMBG devices has been urged as an efficient approach
of maintaining healthy glucose levels in patients with T2DM. It helps to determine which foods or diet area unit best
for one’s management, inform the patient and doctor regarding however well the medication regime is functioning,
inform the patient and doctor regarding however well the medication regime is functioning, It’s necessary for
endeavour dangerous tasks that may well be influenced by high or low glucose, like driving and handling dangerous
machinery.
Fig.5: Self watching of glucose Device
5.2. Pedometer:
The measuring system could be a comparatively straightforward tool which will be accustomed assess
physical activity and inspire patients to attain goals. mistreatment the measuring system created it easier to line
personal exercise goals. These goals enclosed increasing the number of exercise, exercise frequently, and exercise
daily. Experiences with mistreatment the measuring system were primarily positive. The measuring system
promoted exercise and was thought of easy, simple to use, and its options were simply adopted. The measuring
system helped participants keep track of their quantity of exercise and realize things within which it had been easier
to realize the desired variety of steps and things within which it had been simple to extend the amount of daily steps.
Fig. 6: Pedometer
Advantages of measuring system are In terms of polygenic disorder, once corpulent men and ladies with polygenic
disorder bear walking on the order of regarding nineteen,000 steps per day combined with caloric restriction, a
major decrease in basal glucose levels (–0.9 mmol/L) and improved hypoglycaemic agent sensitivity were
discovered. Pedometers were additionally shown to considerably decrease pulsation pressure level by four torr (a
decrease in pulsation BP by two torr is related to a tenth reduction in stroke mortality supported data-based data).
The reduction in pulsation BP was freelance of decreases in weight.
5.3. Cell phones and Wireless Devices:
Existing and rising technologies like wireless devices (cell phones) with email and text electronic
messaging (SMS) practicality, pagers, and therefore the net will facilitate facilitate patient self-management of
polygenic disorder. Wireless technologies may be used as treater tools to facilitate the knowledge between patient
and supplier and treatment recommendation between clinic visits. Results from studies incorporating the
employment of remote patient watching devices (cell phones and alternative wireless tools) have indicated vital
decreases in HbA1c levels and improved health-related outcomes in polygenic disorder.
Fig. 7: Cell phones and Wireless Devices
5.3.1.Benefits of Cell phones and Wireless Devices
These forms of devices area unit sensible and efficient ways for watching clinical outcomes.
These forms of devices facilitate in increasing patient adherence to treatments.
The use of those devices could encourage patients to stick to their watching regimens by acting as
reminders to self-manage their malady.
6.FUTURE WORK
New therapies, monitoring, Associate in Nursing revolutionary facultative technologies applied to attention
represent an historic chance to enhance the lives of individuals with polygenic disorder. These advances change
additional significant watching of glucose values with the facilitation of additional optimum hypoglycaemic agent
dosing and delivery. Newer insulins and delivery systems area unit in development that request to mitigate each
hyperglycaemia and symptom and increase time in vary. info systems currently exist that will be leveraged to merge
knowledge from antecedently distinct systems into new models of connected care. This review highlights necessary
developments that serve to extend effectiveness whereas reducing the burden of polygenic disorder care within the
close to future.
6.1.New and Smarter Insulins
Ultrarapid insulins
Faster acting hypoglycaemic agent analogues are below development in recent years to enhance postprandial (PP)
glycemic management (i.e., forestall and/or scale back PP glycemic excursions). Their quicker onset also will
change additional speedy correction of hyperglycaemia.
This is Associate in Nursing approach to “glucose-responsive” hypoglycaemic agent medical care has been through
the event of “smart hypoglycaemic agent.” The conception is that hypoglycaemic agent itself would sense and
answer aldohexose, activating unharness only required through ways like glucose-responsive chemical compound
encapsulation or direct modification of the hypoglycaemic agent molecule.
Inhaled insulins
Inhaled hypoglycaemic agent had been a better “meal hypoglycaemic agent” possibility thanks to the
potential for elimination of inconvenient injections and for speedy insulin action. Despite its abstract promise,
efforts to comprehend common usage of indrawn hypoglycaemic agent haven't nevertheless succeeded.
Smart hypoglycaemic agent pens
Today's good Bluetooth hypoglycaemic agent pens bring bolus calculators and knowledge pursuit to NGO patients.
It options a bolus calculator, period insulin-on-board pursuit, dose history knowledge, reminders to avoid lost meal
hypoglycaemic agent doses, Associate in Nursingd an hypoglycaemic agent temperature monitor. The InPen app
additionally receives CGM knowledge (Dexcom) and provides 24-h aldohexose averages and outline trend lines.
Clinicians not stay within the dark relating to the hypoglycaemic agent doses truly delivered. Users will set the app
to mechanically send a text message with every hypoglycaemic agent dose, aldohexose reading, or supermolecule
entry to as several as 5 recipients.
7.CONCLUSION
In this work, we've got analyzed the first prediction of {diabetes|polygenic disorder|polygenic malady} by taking
under consideration numerous risk factors associated with this disease mistreatment machine learning techniques.
Extracting information from real health care dataset may be helpful to predict diabetic patients. To predict DM
effectively, we've got studied four widespread machine learning algorithms, specifically Support Vector Machine
(SVM), Naive Bayes (NB), K-Nearest Neighbor (KNN) and C4.5 call tree, on adult population knowledge to predict
DM.
Our experimental results shown that the performance of C4.5 call tree is considerably superior to alternative
machine learning techniques for the classification of diabetic knowledge. The experimental results might assist
health care to require early bar and create higher clinical choices to manage polygenic disorder and therefore save
human life. to require under consideration further attributes and analysis for more analysis is our future work. we
tend to additionally studied regarding the prevailing Technologies that aim to tell regarding the polygenic disorder
watching devices and additionally regarding their improved communication patients as a results of patient self-
monitoring technologies.
References
1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2831791/
2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2787048/
3. Md Faisal Faruque, “Performance Analysis of Machine Learning Techniques to Predict Diabetes Mellitus” ,2019.
4. Sajida Perveen, “Metabolic Syndrome and Development of Diabetes Mellitus: Predictive Modeling Based on Machine
Learning Techniques”,2018.
5. Vaishali Malpe, “Machine Learning Trends in Medical Sciences”.
6. Debadri Dutta, “Analysing Feature Importances for Diabetes Prediction using Machine Learning”.