KEMBAR78
Analysis of Heart Disease Using in Data Mining Tools Orange and Weka | PDF | Machine Learning | Data Mining
0% found this document useful (0 votes)
124 views7 pages

Analysis of Heart Disease Using in Data Mining Tools Orange and Weka

Uploaded by

bek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
124 views7 pages

Analysis of Heart Disease Using in Data Mining Tools Orange and Weka

Uploaded by

bek
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Global Journal of Computer Science and Technology: C

Software & Data Engineering


Volume 18 Issue 1 Version 1.0 Year 2018
Type: Double Blind Peer Reviewed International Research Journal
Publisher: Global Journals
Online ISSN: 0975-4172 & Print ISSN: 0975-4350

Analysis of Heart Disease using in Data Mining Tools Orange


and Weka
By Sarangam Kodati & Dr. R. Vivekanandam
Sri Satya Sai University
Abstract- Health care is an inevitable task to be done in human life. Health concern business has
become a notable field in the wide spread area of medical science. Health care industry contains
large amount of data and hidden information. Effective decisions are made with this hidden
information by applying patient; however, with data mining these tests could be reduced. But there is
a lack of analyzing tool according to provide effective test outcomes together with the hidden
information, so and such system is developed using data mining algorithms for classifying the data
and to detect the heart diseases. Data mining acts so a solution by many healthcare problems. Naïve
Bayes, SVM, Random Forest, KNN algorithm is one such data mining method which serves with the
diagnosis regarding heart diseases patient. This paper analyzes few parameters and predicts heart
diseases, thereby suggests a heart diseases prediction system (HDPS) based total on the data
mining approaches.
Keywords: data mining, weka, orange, heart disease, data mining classification techniques.
GJCST-C Classification: J.3

AnalysisofHeartDiseaseusinginDataMiningToolsOrangeandWeka

Strictly as per the compliance and regulations of:

© 2018. Sarangam Kodati & Dr. R. Vivekanandam. This is a research/review paper, distributed under the terms of the Creative
Commons Attribution-Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-
commercial use, distribution, and reproduction inany medium, provided the original work is properly cited.
Analysis of Heart Disease using in Data Mining
Tools Orange and Weka
Sarangam Kodati α & Dr. R. Vivekanandamσ

Abstract- Health care is an inevitable task to be done in human weight, symptoms, etc. This will help the doctors
life. Health concern business has become a notable field in the diagnose the disease more efficiently. Knowledge
wide spread area of medical science. Health care industry discovery in databases is the method of finding useful

2 018
contains large amount of data and hidden information. information and patterns into data. Knowledge discovery
Effective decisions are made with this hidden information by
within databases can be do using data mining. It makes

Year
applying patient; however, with data mining these tests could
be reduced. But there is a lack of analyzing tool according to use of algorithms after extract the information and
provide effective test outcomes together with the hidden patterns derived by the knowledge discovery in
17
information, so and such system is developed using data databases process. Various stages of knowledge
mining algorithms for classifying the data and to detect the discovery in databases process are highlighted in Fig.1.

Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I
heart diseases. Data mining acts so a solution by many
healthcare problems. Naïve Bayes, SVM, Random Forest,
KNN algorithm is one such data mining method which serves
with the diagnosis regarding heart diseases patient. This
paper analyzes few parameters and predicts heart diseases,
thereby suggests a heart diseases prediction system (HDPS)
based total on the data mining approaches.
Keywords: data mining, weka, orange, heart disease,
data mining classification techniques.

I. Data Mining

D
ata mining is concerned together with the
method of computationally extracting unknown
knowledge from vast sets of data. Extraction of
useful knowledge from the enormous data sets and
Fig. 1: KDD Process
providing decision-making results for the diagnosis or
remedy of diseases is very important. Data mining can Various stages concerning knowledge
stand used to extract knowledge by analyzing and discovery of databases method are described as
predicting some diseases. Health care data mining has follows. In Selection stage, that obtains the different data
a large potential according to discover the hidden resources. In preprocessing stage, it removed the
patterns among the data sets about the medical unwanted missing and noisy data and furnished the
domain. Various data mining methods are available with clean data which execute format in accordance
their suitability dependent on the healthcare data. Data including a common format of transform stage. Then
mining applications in health care can have a wonderful data mining techniques are applied according to get
potential and effectiveness. It automates the process of desired output. Finally into the between the signification
finding predictive information in large databases. stage, that will present the result after end user in a
Disease prediction plays an important role in data meaningful manner.
mining. Finding of heart disease requires the
performance of some tests on the patient. However, use II. Data Mining Techniques
of data mining techniques can reduce the number of The most frequently used Data Mining
tests. This reduced test set plays a significant role in techniques are specified below:
performance and time. Health care data mining is an a) Classification learning: The learning algorithm takes
important task because it allows doctors to see which a set of classified examples (training set) and uses it
attributes are more important for diagnosis such as age, for training the algorithms. With the trained
algorithms, classification of the test data takes place
Author α: Research Scholar, Department of Computer Science and
Engineering, Sri Satya Sai University of Technology and Medical
based over the patterns and rules extracted from
Science, Sehore, Bhopal, Madhya Pradesh, India. the training set.
e-mail: k.sarangam@gmail.com b) Numeric predication: This is a variant of
Author σ: Professor, Director in Muthayammal Engineering College,
Namakkal, India.
classification learning with the exception that

© 2018 Global Journals


Analysis of Heart Disease using in Data Mining Tools Orange and Weka

instead of predicting the discrete class the outcome available because of modifications. Two such examples
is a numeric value. of open source licenses are the GPL, or general people
c) Association rule mining: The association and consent (GNU.org, 2015a), then GNU(GNU.org, 2015b).
patterns between the some attributes are extracted Anyone be able to develop extensions then
or from its attributes, rules are created. The rules customizations about open source software; though,
and patterns are used predicting the categories or charging a fee for certain things to do is typically
classification of the test data. prohibited by using a public license agreement whereby
any modifications to the source code automatically
d) Clustering: The grouping of similar instances into
become public domain. Communities emerge around
clusters takes place. The challenges or drawbacks
software with developers worldwide extending open
considering this type of machine learning is that we
source software.
have according to first identify clusters and assign a
2 018

new instance according to these clusters[8]. V. Heart Diseases


Out of this four types of learning methods, we
Year

need to identify the algorithm as performs better. The The highest mortality in both India and abroad
application of data mining methods depends on the is due to heart disease. So it is vital time to check this
18 types of data which is fitted to be used in the death toll by correctly identifying the disease between
techniques, or solving data mining troubles depend on initial stage. The matter becomes a headache for all
Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I

the types of data to stand used and the selection about medical doctors both in India and abroad. Nowadays
data mining technique which is most suitable for the doctors are adopting many scientific technologies and
data used. methodology for both identifications or diagnosing not
only the common disease but also many fatal diseases.
III. Machine Learning The successful treatment is continually attributed to right
and accurate diagnosis. Doctors may also sometimes
Machine learning (ML), employed as like a
fail to take accurate decisions while diagnosing the
method in data science, is the process of programming
heart disease about a patient, therefore heart disease
computers after learning from past experiences
prediction systems which use machine learning
(Mitchell, 1997Machine Learning seeks to develop
algorithms assist in such cases to get accurate
algorithms to that amount learn out of data directly with
results [1].
little or no human intervention. Machine Learning
algorithms perform a range of tasks such so like VI. Heart Disease Dataset
prediction, classification, or decision making. Machine
Learning stems from artificial intelligence research and The dataset used for this work is from UCI
has become an essential aspect of data science. Machine Learning repository from which the Cleveland
Machine learning begins with input so a training data heart disease dataset is used. The dataset has 303
set. In this phase, the Machine Learning algorithm instance and 76 attributes. However, only 14 attributes
employs the training dataset after learning from the data are used of this paper. These 14 attributes are the
and structure patterns. The learning phase outputs a consider factors for the heart disease prediction [8].
model so much is used by way of the testing phase. The Even though it has 303 instances as only 297 are
testing phase employs any other dataset, applies the completed and the remaining rows contained missing
model from the training phase, and results are values and removed out of the experiment.
presented for analysis. The overall performance
regarding the test dataset demonstrates the model's VII. Overview of Data Mining Tools
ability in conformity with performing its task against Data mining has a wide number of applications
data. Machine learning extends beyond a statically ranging from marketing and advertising about goods,
coded set regarding statements into statements, so a lot functions and products, artificial intelligence research,
are dynamically generated based as regards the input biological sciences, crime investigations to high-level
data. government intelligence. Due to its widespread usage
and complexity involved in building information mining
IV. Open Source Softwares
applications, a vast number of Data mining tools hold
Open source has, in the minds regarding many, been developed over decades. Every tool has its
come to be synonymous with free software (Walters, advantages and disadvantages. [6] Within data mining,
2007). Open source software is software where the there is a group of tools that have been developed by a
development then the source code are made publically research community and data analysis enthusiasts; he
available and designed after denying everyone the right are provided free of the price using one on the existing
according to exploit the software (Laurent, 2004). Open open-source licenses. An open-source development
source general refers in conformity with the source code model means that the tool is a result of a community
concerning the application being freely and openly effort, not necessarily supported by a single
© 2018
1 Global Journals
Analysis of Heart Disease using in Data Mining Tools Orange and Weka

organization but alternatively the result regarding X. The Comparative Study


contributions from an international and informal
development team. This development style affords a The methodology of the study constitutes
means on incorporating the various experiences Data regarding collecting a set of free data mining and
boring gives many excavation techniques according to knowledge discovery tools according to be tested,
extract data from databases. Data mining tools predict specifying the data sets to be used, and selecting a set
future trends, behaviors, allowing business according to of classification algorithm according to test the tools'
make proactive, knowledge-driven decisions. The performance. Demonstrates the overall methodology
development and application concerning data mining followed for fulfilling the goal of its research.
algorithms require the use of very powerful software
tools. As the number of accessible tools continues by
grow the choice of the most suitable tool becomes

2 018
increasingly difficult. [6] The top 6 open source tools

Year
available because data mining is briefed as below.
Data mining tools like Weka and Orange are
used to perform various data mining techniques. 19
The first step of the methodology consists of selecting a

Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I
number of available open source data mining tools in
accordance with being tested. Many open data mining
tools are available for free on the Web. After surfing the
Internet, some tools were chosen; including the Waikato
Fig. 2: Tools Implementation Methodology
Environment for Knowledge Analysis (WEKA) durability
and Orange Canvas. a) Precision and Recall
It is also known as positive predictive value. It is
VIII. Weka defined as the average probability of relevant retrieval.
The Waikato Environment for Knowledge Precision = Number of true positives/Number of true
Analysis (WEKA) [7] is an open source software and positives + False positives.
machine learning toolkit introduced by Waikato b) Recall
University, New Zealand. WEKA helps several standard It is defined as the average probability of
data mining tasks as data preprocessing, clustering, complete retrieval. Recall= True positives/True positives
classification, regression, visualization and feature + False negative
selection New algorithms can also be implemented the
usage concerning WEKA with existing data mining and c) Navie Bayes
machine learning techniques. WEKA gives a number When the dimensionality of the inputs is high,
sources because loading data, which include files, URLs the Naïve Bayes Classifier method is particularly suited.
then databases. It helps file formats include WEKA"s The problem including the Naïve Bayes Classifier is so
own ARFF format, CSV, Lib SVMs format, and C4.5’s that assumes all attributes are independent on each
format. Many evaluation criteria are also provided of other which in general cannot be applied. Naive Bayes
WEKA certain as confusion matrix, precision, recall, true is harder to debug and understandable [2]. Naive Bayes
positive and false negative, etc. Some of the used into robotics and computer vision. In naive Bayes,
advantages of WEKA tool includes Open source, decision tree perform poorly. Comparative analysis of
platform independent and portable, graphical user precession and recall analyzing for heart disease
interface and contains a very vast collection of different data sets precession in Orange 82.4% and Recall
data mining algorithms. 80.6%. In WEKA precession 83.7% and Recall 83.7
%.Compare to Orange tool and WEKA, weka is best
IX. Orange precession and Recall.

Orange is an open source machine learning d) Support Vector Machine


technology or data mining software. Orange can be Support Vector Machines proved themselves to
used for explorative data analysis and visualization[3]. It be very fine into a variety of pattern classification tasks
gives a platform for experiment selection, predictive and accordingly received a great deal of attention
modeling, and recommendation systems and can be recently. Support vector machine is a supervised
used of genomic research, biomedicine, bioinformatics, machine learning technique. The SVM algorithm
and teaching. Orange is always preferred when the predicts the occurrence about heart disease by ability
factor of innovation, quality, or reliability is on plotting the disease predicting attributes regarding
involved[10],[4]. the multidimensional hyperplane or classifies the
classes optimally by creating the approach between two

© 2018 Global Journals


Analysis of Heart Disease using in Data Mining Tools Orange and Weka

data clusters[5]. This algorithm attains high accuracy by Table 1: Classification Algorithm Compare Precession
the use regarding nonlinear features called kernels. And Recall In Orange And Weka Tools Heart Disease
Comparative analysis of precession and recall
analyzing for heart disease data sets precession in Algorithm Precessio n
Recall in Precession Recall in
Orange 81.7% and Recall 70.5%. In WEKA precession classification in
Orange in WEKA WEKA
81.8% and Recall 81.9 %.Compare to Orange tool and Average Orange
WEKA, weka is best precession and Recall. Naïve base
0.824 0.806 0.837 0.837
classifier
e) Random Forest SMO or
Random Forest is essentially an ensemble of Support
unpruned classification trees. It gives excellent Vector 0.817 0.705 0.84 0.8365
performance concerning a number about practical Machine
2 018

problems, largely because such is not sensitive to noise Random


in the dataset, and it is not subject to overfitting. It works 0.779 0.734 0.818 0.819
Forest
Year

fast and generally exhibits a substantial performance 1BK or


20 improvement over many other tree-based algorithms. K-Nearest 0.58 0.547 0.753 0.752
Random forests are built by combining the predictions Neighbor
on a number of trees, each of which is trained within
Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I

isolation. There are three main choices to stand


performed when constructing a random tree. 0.9
Comparative analysis of precession and recall 0.8
0.7 Naïve base
analyzing for heart disease data sets precession in
0.6 classifier
Orange 77.9% and Recall 73.4%. In WEKA precession
81.8% and Recall 81.9 %.Compare to Orange tool and 0.5
WEKA, weka is best precession and Recall. 0.4 SMO or
0.3 Support Vector
f) KNN Classifier Machine
0.2
K-nearest neighbor is a sophisticated approach
0.1 RandomForest
for classification that finds a group of K objects in the
training documents that are close to the test value. To 0
classify an unlabeled object, the distance between it
object and labeled object is computed and it’s K 1BK or K-
nearest neighbors are identified. Classification accuracy Nearest
commonly depends of the choice value of K and will be Neighbor
better than that of using the nearest neighbor
classifier[9]. For vast data sets, K can be larger to
reduce the error. Choosing K can be done
experimentally, where a number concerning patterns
taken out from the training set can be categorised using Fig. 3: Classification Algorithm Graph for Precession and
the remaining training patterns for different values over Recall in Weka Tool Heart Disease
k. The value of K which gives the least error in
classification will be chosen. If same class is shared in XI. Conclusion and Future Scope
various of K-nearest neighbors, then per-neighbor Data mining techniques help in finding the
weights of as class are added together, and the hidden knowledge in a team of disease data that can
resulting weighted sum is used as the likelihood score remain used to analyze and predict the future behavior
of that class with respect to the test document of diseases. Classification is one the records mining
[8].Comparative analysis of precession and recall methods which assigned a class label to a set of
analyzing KNN for heart disease data sets precession unclassified cases. Comparative analysis concerning
in Orange 58% and Recall 54.7%. In WEKA precession precession and recall weka is the best overall
75.3% and Recall 75.2 %. Compare to Orange tool and performance compared to an orange. The main
WEKA weka is best precession and Recall. objective concerning this paper is to compare the data
mining tools on the basis of theirs classification
precession and recall. According to the result of three
data mining tools used in this paper, such has been
observed so different data mining tools are furnishing
different results concerning same data set with different
classification algorithm. WEKA and ORANGE are

© 2018
1 Global Journals
Analysis of Heart Disease using in Data Mining Tools Orange and Weka

showing best classification Precession and Recall. In


future, more disease dataset can be used for
classification methods, and other data mining
techniques such as clustering can be used according to
compare the performance of various data mining tools.

References Références Referencias


1. Prerana T H M1, Shivaprakash N C2 , Swetha N3
”Prediction of Heart Disease Using Machine
Learning Algorithms- Naïve Bayes, Introduction to
PAC Algorithm, Comparison of Algorithms and

2 018
HDPS” International Journal of Science and
Engineering Volume 3, Number 2 – 2015 PP: 90-99

Year
©IJSE Available at www.ijse.org ISSN: 2347-2200.
2. Majali J, Niranjan R, Phatak V, Tadakhe O. Data
21
mining techniques for diagnosis and prognosis
of cancer. International Journal of Advanced

Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I
Research in Computer and Communication
Engineering. 2015; 4(3):613–6.
3. http://www.kdnuggets.com/2015/12/top-7-new-
features-orange-3.html/2
4. Orange Data Mining, ‘Orange Data Mining Library
Documentation Release 3’.
5. Iyer A, Jeyalatha S, Sumblay R. Diagnosis of
diabetes using classification mining techniques.
IJDKP. 2015; 5(1):1–14.
6. Gosain, A.; Kumar, A., "Analysis of health care data
using different data mining techniques," Intelligent
Agent & Multi-Agent Systems, 2009. IAMA 2009.
International Conference on , vol., no., pp.1,6, 22- 24
July 2009.
7. R. Kirkby, WEKA Explorer User Guide for version 3-
3-4, University of Weikato, 2002.
8. Ralf Mikut and Markus Reischl Wiley Interdisciplinary
Reviews: Data Mining and Knowledge Discovery,
Volume 1, Issue 5, pages 431–443, September/
October 2011.
9. S. TAN, "Neighbor-weighted K-nearest neighbor for
unbalanced text corpus", Expert Systems with
Applications, Vol. 28, No. 4, pp. 667-671, 2005.
10. http://orange.biolab.si/

© 2018 Global Journals


Analysis of Heart Disease using in Data Mining Tools Orange and Weka
2 018 Year

22
Global Journal of Computer Science and Technology ( C ) Volume XVIII Issue I Version I

This page is intentionally left blank

© 2018
1 Global Journals

You might also like