KEMBAR78
Heart Disease Python Report 1st Phase | PDF | Software Testing | Machine Learning
0% found this document useful (0 votes)
422 views33 pages

Heart Disease Python Report 1st Phase

This document discusses machine learning techniques for heart disease diagnosis and classification. It begins by introducing heart disease as a major health issue and the need for effective diagnostic tools. It then reviews several machine learning classifiers that have been used for disease diagnosis, including decision trees, SVM, KNN and random forest algorithms. The document evaluates these classifiers on a heart disease dataset to determine the best performing algorithm for diagnosis. The goal is to apply data mining to extract meaningful patterns from medical data in order to accurately predict and diagnose heart disease.

Uploaded by

Aishwarya P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
422 views33 pages

Heart Disease Python Report 1st Phase

This document discusses machine learning techniques for heart disease diagnosis and classification. It begins by introducing heart disease as a major health issue and the need for effective diagnostic tools. It then reviews several machine learning classifiers that have been used for disease diagnosis, including decision trees, SVM, KNN and random forest algorithms. The document evaluates these classifiers on a heart disease dataset to determine the best performing algorithm for diagnosis. The goal is to apply data mining to extract meaningful patterns from medical data in order to accurately predict and diagnose heart disease.

Uploaded by

Aishwarya P
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

ABSTRACT

Heart disease is a major life threatening disease that can cause either death or a serious long term
disability. However, there is lack of effective tools to discover hidden relationships and trends in
e-health data. Medical diagnosis is a complicated task and plays a vital role in saving human
lives so it needs to be executed accurately and efficiently. An appropriate and accurate
computer based automated decision support system is required to reduce cost for achieving
clinical tests. This paper provides an insight into machine learning techniques used in
diagnosing various diseases. Various data mining classifiers have been discussed which has
emerged in recent years for efficient and effective disease diagnosis.

However using data mining technique can reduce the number of test that are required.
In orderto reduce from heart diseases there have to be a quick and efficient detection
technique. Decision Tree is one of the effective data mining methods used. This research
compares different algorithms of Decision Tree classification seeking better performance in
heart disease diagnosis. The algorithms which are tested are SVM algorithm, K Nearest
Neighbour algorithm and Random Forest algorithm .

Decision Tree is one of the effective data mining methods used. This datasets consists
of 303 instances and 76 attributes. Subsequently, the classification algorithm that has optimal
potential will be suggested for use in sizeable data. The goal of this study is to extract
hidden patterns by applying data mining techniques, which are noteworthy to heart diseases
and to predict the presence of heart disease in patients where this presence is valued from
no presence to likely presence.

CHAPTER 1

INTRODUCTION
The proposed project identifies the risk factors for the different types of heart diseases. Most
hospitals today employ some sort of hospital information systems to manage their healthcare or
patient data .These systems typically generate huge amounts of data which take the form of
numbers, text, charts and images. Unfortunately, these data are rarely used to support clinical
decision making. There is a wealth of hidden information in these data that is largely untapped. This
raises an important question: “How can we turn data into useful information that can enable
healthcare practitioners to make intelligent clinical decisions?” This is the main motivation for this
research.

MOTIVATION
Machine learning techniques have been around us and has been compared and used for
analysis for many kinds of data science applications. The major motivation behind this research-
based project was to explore the feature selection methods, data preparation and processing behind
the training models in the machine learning. With first hand models and libraries, the challenge we
face today is data where beside their abundance, and our cooked models, the accuracy we see during
training, testing and actual validation has a higher variance. Hence this project is carried out with the
motivation to explore behind the models.

Furthermore, as the whole machine learning is motivated to develop an appropriate


computer-based system and decision support that can aid to early detection of heart disease, in this
project we have developed a model which classifies if patient will have heart disease or not based on
various features (i.e. potential risk factors that can cause heart disease). Hence, the early prognosis
of cardiovascular diseases can aid in making decisions on lifestyle changes in high risk patients and
in turn reduce the complications, which can be a great milestone in the field of medicine.

EXISTING SYSTEM
 Different existing data mining procedures and its application were considered or explored.
Utilization of machine learning algorithms was connected in various medical data sets.

 Machine learning strategies have diverse power in different medical data sets.

 Previously mentioned conventional machine learning techniques gave less exact outcome
and results additionally shifts in light of the procedures has been utilized for the prediction.
DRAWBACKS:

 Different prediction result on different datasets and techniques.

 Less Accurate and Less effective prediction result.

PROPOSED SYSTEM:
Our proposed strategy focuses on a novel machine learning procedures for Heart disease (DD)
classification and prediction, thus overcoming the existing problem. By utilizing Random Forest or
SVM algorithms we will make our model in order to increase the performance and accuracy.To
enhance the prediction of classifiers, genetic search is incorporated., the genetic search resulted in
13 attributes which contributes more towards the diagnosis of the cardiac disease. The classifiers
such as Naïve Bayes were used for diagnosis of patients with heart disease. The classifiers were fed
with reduced data set with 13 attributes. Results are shown in. Observations exhibit that the Naïve
Bayes Machine Learning technique outperforms other two Machine Learning techniques after
incorporating feature subset selection but with high model construction time. Naïve Bayes performs
consistently before and after reduction of attributes with the same model construction time.
Advantages of Proposed System:
 Naïve Bayes performs consistently before and after reduction of attributes with the same
model construction time.
 Its more accuracy compare to other classification process.

OBJECTIVES:
The objective our works to predict the diagnosis of heart disease with reduced number of attributes.
Here fourteen attributes involved in predicting heart disease. But fourteen attributes are reduced to
six attributes by using three classifiers like Naive Bayes, Classification are used to predict the
diagnosis of heart disease after the reduction of number of attributes.

PROBLEM STATEMENT:
Heart disease prediction using Machine Learning is one of the most interesting and challenging
tasks. The shortage of specialists and high wrongly diagnosed cases has necessitated the need to
develop a fast and efficient detection system.
The main objective of this work is to identify the key patterns or features from the medical data
using the classifier model. The attributes that are more relevant to heart disease diagnosis can be
observed. This will help the medical practitioners to understand the root causes of disease in depth.

CHAPTER 2

LITERATURE SURVEY

Archana Singh and Rakesh Kumar et al. [2] proposed an approach mainly focused on the
accuracy for algorithm. Authors considered it as one of parameter for analysis of performance of
algorithms. Accuracy of the algorithms in machine learning depends upon the dataset that used for
training and testing purpose. When they performed the analysis of algorithms on the basis of dataset
whose attributes were Age, Chest pain type, cholesterol, Resting etc and on the basis of confusion
matrix, they found KNN was best one. Methodologies used by them are: Linear regression, Decision
tree, Support Vector Machine, K-nearest Neighbour. Since it uses many attributes it requires large
number of data sets which is a drawback.

Rahul Katarya and P. Srinivas et al. [4] proposed an approach which mainly focused on feature
selection and prediction. These two are essential for every automated system. By choosing features
efficiently, we can achieve better results in predicting heart disease. Authors summarized some
algorithms which are useful while selecting the features, like hybrid grid search algorithm and
random search algorithm, etc. They had chosen some common attributes such as Gender, Age,
Resting blood pressure, ECG results, Heart rate, etc which are used to predict the heart diseases.
Methodologies used by them are: Artificial neural network (ANN), Support vector machine (SVM),
Decision tree (DT), Random forest (RF).

Jian Ping Li, et al. [5] proposed an efficient machine learning based diagnosis system has been
developed for the diagnosis of heart disease. The novelty of the study is developing a diagnosis
system for identification of heart disease. Jian Ping Li, et al, have made a little improvement in
prediction accuracy which has great influence in diagnosis of critical diseases. Here, four standard
feature selection algorithms along with one proposed feature selection algorithm is used for features
selection. LOSO CV method and performance measuring metrics are used. The Cleveland heart
disease dataset is used for testing purpose. Techniques of pre-processing such as removing attribute
missing values, Standard Scalar (SS), Min-Max Scalar have been applied to the dataset.

B. Keerthi Samhitha, et al. [7] contributed the Cleveland heart dataset from the UCI machine
learning archive was used for preparing and testing purposes and they proposed a novel strategy
those targets finding critical highlights by applying machine learning procedures bringing about
improving the precision in the forecast of cardiovascular ailment. The forecast model is presented
with various mixes of highlights and a few known grouping procedures. they produced an upgraded
exhibition level with a precision level of 88:7% through the expectation model for heart disease with
the half breed irregular woods with a straight model.

Sameer S Yadav et al [9] proposed “Application of Machine Learning for the Detection of Heart
Disease” where they have researched and performed a comprehensive analysis of different
algorithms in machine learning and developed a more accurate algorithm. The task is to determine
which studies will be positive and which will be inaccurate in the cardiovascular identification
process. For regional developmental methods and good statistical precision, the Cleveland dataset is
used. For the optimum set of parameters regarding the analysis, they have used both the test train
division principle together with cross-validation. Algorithms like Logistic Regression, K-means
Clustering, Naive Bayes, K-Nearest Neighbours, Neural Network are used.
Santhana Krishnan J. and Geetha S [11], In this paper the author has tried to focus on the male
patients and taken into account multiple factors which can be the reason of heart disease like factors,
and risk type. Author used one of the widely used Data mining techniques WEKA and KNN
algorithm is used for the prediction part.While the author used two algorithm to get the analysis but
their are other algorithms and prediction techniques which can be used and the factors considered
during the prediction are quite less we can find other factors which are affecting heart disease and
can work on that.

Senthilkumar Mohan, et al [12], proposed Effective Heart Disease Prediction Using Hybrid
Machine Learning Techniques in which the objective is to find critical by applying Machine
Learning, bringing about improving the exactness in the expectation of cardiovascular malady. The
expectation model is created with various blends of highlights and a few known arrangement
strategies. Diverse data mining approaches and expectation techniques such as K-Nearest
Neighbours , LR, SVM, Neural Network, and vote have been fairly used to predict heart disease.

Anjan NihkilRepaka, Sai Deepak Ravikanti [15], Here, three phases are applied to get prediction
of heart diseases. Pre-processing is applied to first phase. In which data are filtered. In second phase
different classification techniques are applied on output of phase one. Classification accuracy,
precision, recall and measure will be used to evaluate the efficiency of the used techniques.Then
they choose the highly efficient algorithms from the applied algorithms and then by applying
hybridization, result is combine of choose algorithm. Phase three is Diagnose. In this, if the history
of patient is available then compare it with result and then it predict heart disease.The main goal in
this paper is to investigate available data mining techniques to predict heart disease and compare
them, then combine the result from all of them to get most accurate result.

Chandra Shekar K., Chandra P, et al [17], To manage the risk of heart diseases affecting
millions of people around the world. This paper tried to analyse heart disease dataset using
important types of data mining techniques in order to create a 100% accurate model based on
datamining algorithm. The results obtained can be a key for gaining insights from the dataset,
forecasting the heart diseases status of the new patients and get good techniques for improving the
accuracy, efficiency, and quality of the care processes for heart disease.This paper has presented a
naïve byes with SVM and implemented for the prediction of heart disease. This paper provides a
systematic scheme for the heart diseases, and the relevant healthcare data is created by the use of
UCI Repository dataset. Supervised learning is used for prediction.

MontherTarawneh, OssamaEmbarak [18], In this paper, DSS(Decision Support System) using


Naïve Bayes algorithm. In this proposed system, first they collect the data such as age, sex, smoking
details, blood sugar, type of chest pain etc. Which are given by users. Then they used Naïve Bayes
classifier for supervised learning. It gives an independent variable as input. It reduced the time
complexity and give better accuracy as compare to other techniques(Advance Encryption Standard)
algorithm is used to secure the patient’s data. Its revealed that in regard to accuracy, the prevailing
technique surpasses the Naive Bayes by yielding an accuracy of 89.77%in spite of reducing the
attributes.Here minimum attributes are taken with high accuracy for prediction.

Aditi Gavhane, GowthamiKokkula, et al [19], In this paper proposed system they used the neural
network algorithm multi-layerperceptron (MLP) to train and test the dataset. In this algorithm there
will be multiple layers like one for input, second for output and one or more layers are hidden layers
between these two input and output layers.in the proposed system we used the neural network
algorithm multi-layer perceptron (MLP) to train and test the dataset.

Kanika Pahwa and Ravinder Kumar [23], In this article, divide process into 3stages.In first stage
pre-processing is applied to raw data. Raw data are age, sex, Cp(related to chest pain), cholesterol,
etc. Data transformation is done in this stage. Data transformation is done to make this problem a
binary class problem. Then this pre-processed data used as input of second stage and feature
selection is applied to get output of relevant feature. In feature selection irrelevant data are removed.
Gain ration is used to get score of given attributes. At last stage classification algorithm is applied.
Here they applied Naïve Bayes algorithm and Random Forest Algorithm to predict heart diseases.
They used database that is publicly accessible. Confusion matrix ROC curve and area under curve
are evaluated. It compares the accuracy of two algorithms.

Ritika Chadha, Shubhankar Mayank, et al [28], In this article, KNN, SVM and ANN used for
the prediction of heart diseases. Also compare the result of this three algorithm and also used
ensemble classifier. They used multiclass and binary classification. Two types of evolution are used
percentage split and cross validation. Binary classification is higher then multiclass classification.
The result of percentage split is higher then cross validation. In this method, data get from dataset
and then by applying feature selection data is selected that data is used as input of model. The data is
split onto two parts: training and testing dataset, finally cross validation is applied.

CHAPTER 3

System Requirement Specification

A software requirements specification (SRS) is a detailed description of a software system


to be developed with its functional and non-functional requirements. The SRS is developed
based the agreement between customer and contractors. It may include the use cases of how
user is going to interact with software system. The software requirement specification
document consistent of all necessary requirements required for project development. To
develop the software system we should have clear understanding of Software system. To
achieve this we need to continuous communication with customers to gather all requirements.

AgoodSRSdefinesthehowSoftwareSystemwillinteractwithallinternalmodules,hardware,
communication with other programs and human user interactions with wide range of real life
scenarios. Using the Software requirements specification (SRS) document on QA lead,
managers creates test plan. It is very important that testers must be cleared with every detail
specified in this document in order to avoid faults in test cases and its expectedresults.

It is highly recommended to review or test SRS documents before start writing test cases and
makinganyplanfortesting.Let’sseehowtotestSRSandtheimportantpointtokeepinmind while
testingit.

1. Correctness of SRS should be checked. Since the whole testing phase is dependent on
SRS,itisveryimportanttocheckitscorrectness.Therearesomestandardswithwhichwecan
compare and verify.

2. Ambiguity should be avoided. Sometimes in SRS, some words have more than one
meaning and this might confused tester’s making it difficult to get the exact reference. It is
advisable to check for such ambiguous words and make the meaning clear for better
understanding.

3. Requirements should be complete. When tester writes test cases, what exactly is required
from the application, is the first thing which needs to be clear. For e.g. if application needs to
send the specific data of some specific size then it should be clearly mentioned in SRS that
how much data and what is the size limit tosend.

4. Consistent requirements. The SRS should be consistent within itself and consistent to its
referencedocuments.Ifyoucallaninput“StartandStop”inoneplace,don’tcallit“Start/Stop” in
another. This sets the standard and should be followed throughout the testingphase.

5. Verification of expected result: SRS should not have statements like “Work as expected”,
it should be clearly stated that what is expected since different testers would have different
thinking aspects and may draw different results from thisstatement.

6. Testing environment: some applications need specific conditions to test and also a
particularenvironmentforaccurateresult.SRSshouldhavecleardocumentationonwhattype of
environment is needed to setup.
7. Pre-conditions defined clearly: one of the most important part of test cases is pre-
conditions. If they are not met properly then actual result will always be different expected
result. Verify that in SRS, all the pre-conditions are mentionedclearly.

8. Requirements ID: these are the base of test case template. Based on requirement Ids, test
case ids are written. Also, requirements ids make it easy to categorize modules so just by
lookingatthem,testerwillknowwhichmoduletorefer.SRSmusthavethemsuchasiddefines a
particularmodule.

9. SecurityandPerformancecriteria:securityisprioritywhenasoftwareistestedespecially when
it is built in such a way that it contains some crucial information when leaked can cause harm
to business. Tester should check that all the security related requirements are properly defined
and are clear to him. Also, when we talk about performance of a software, it plays a very
important role in business so all the requirements related to performance must be clear to the
tester and he must also know when and how much stress or load testing should be done to test
theperformance.

10. Assumption should be avoided: sometimes when requirement is not cleared to tester, he
tends to make some assumptions related to it, which is not a right way to do testing as
assumptions could go wrong and hence, test results may vary. It is better to avoid
assumptionsandaskclientsaboutallthe“missingrequirements”tohaveabetterunderstanding of
expectedresults.

11. Deletion of irrelevant requirements: there are more than one team who work on SRS so
it might be possible that some irrelevant requirements are included in SRS. Based on the
understanding of the software, tester can find out which are these requirements and remove
them to avoid confusions and reduce workload.

12. Freeze requirements: when an ambiguous or incomplete requirement is sent to client to


analyse and tester gets a reply, that requirement result will be updated in the next SRS
version and client will freeze that requirement. Freezing here means that result will not
change again until and unless some major addition or modification is introduced in
thesoftware.

3.2 Functional Requirements


 GUI User-facing apps
We are planning to develop user interface apps for both the smart phones as well as desktop
so our clients can use these apps to gain access to our network.

 Pre-Processing unit
This unit will pre-process the data obtained from online repository. Various processing
includes, data cleaning (removing the data which is not labeled), stemming, lemmatization
and various other functions.
 Machine learning Models

We will be using one model, which will be trained using the data sets obtained. Various
Machine learning techniques are used to train this model.

3.3 Non-FunctionalRequirements

 Security
Information is stored and shared on our platform is highly secure since the information is
divided into chunks and encrypted and stored on various system. Hence attacks on the system
are difficult.
 Scalability
As the number of nodes increases in our network the scalability of our platform in terms of
space and accessibility increase’s exponentially.
 Performance
As our network is based on peer to peer and not on a single data storage so the single point of
failure is removed and so the performance is increased.
 Userfriendly
The user-facing apps that are used by the clients to access our network are designed in such a
way that they are user friendly and very easy to use.
Cost

The cost of constructing data centres which usually cause billions and the maintenance cost of
these data centres is nullified.
Availability

Sinceoursystemisnotcentralizedandthereisnosinglepointoffailureinoursystem,therefore the
availability of our systemincreased.

HARDWARE REQUIREMENTS

 Processor : >i3
 Ram :4GB.
 HardDisk : 500GB.
 Inputdevice : Standard Keyboard andMouse.
 CompactDisk : 650Mb.
 Outputdevice : High ResolutionMonitor.

SOFTWARE REQUIREMENTS

 Operating system : macOS, Windows XP/7 or higherversion.


 Coding language : python (>=python 3.3 or python2.7).
 IDE : JupyterNotebook.

3.4 TOOLS AND TECHNOLOGYDETAILS

JUPYTER NOTEBOOK

The jupyter notebook is an application which is open source/ free. This includes equations
and live codes. Jupyter notebooks are a side venture from the IPython enterprise which has
an IPython notebook itself. The title is obtained from languages which support R, Julia and
Python. Various computational information can be shared using this platform; the
computational information can include statistics, code or data. Using this tool can be highly
beneficial for the faculties and students as it is a great platform forinteraction.

It also supports various coding languages. This notebook consists of 2 main components:
1. An input code in thefront-end
2. The kernel at theback-end

The front-end mainly comprises of the programming code which is given as the input in the
rectangular cells of the webpage. The back-end includes the kernel where the code is
compiled and the output/results are obtained. There are more than 100 jupyter kernels
introduced which supports different programming languages.

PYTHON

It is an object-oriented programming language. The processing happens during the runtime,


and this is performed by the interpreter. Python's simple to learn and easy to use is an
advantage and thus makes it developer friendly. It is easier to read and understand as the
syntax is conventional. The code can be executed line by line using the interpreter. Python
can support multiple platforms like Linux, UNIX, windows, Macintosh, and so on. The
paradigms of Object-oriented programming are supported by python. The functions such as
polymorphism, operator overloading and multiple inheritance is supportedpython.

MACHINE LEARNING TECHNIQUES

A branch of artificial intelligence is called machine learning, it is used to study algorithms


which bring out the regularities and patterns, these factors can reduce the computational
cost and effortless implementation.

When compared to physical models this technology provides fast training, validation,
testing with increased performance and less complexity.

DATA ANALYSIS

Data analysis is the process of analyzing the raw data so that the processed/analyzed data
can be used in a system or a method/process. It majorly involves three steps data
acquisition, data preprocessing and exploratory data analysis. Data acquisition is collecting
the data from various sources like agencies, etc. for further analysis. While acquiring the
data it is important to collect data which is relevant to the system or the process.

Data preprocessing is a methodology in data mining that is used to convert the raw data
into meaningful and efficient format. Many unrelated and may be present in the results.
Software cleaning is done to tackle the portion. This includes managing details which are
incomplete, noisy information etc. and hence the process of data preprocessing is
performed. Exploratory data analysis is a significant process to carry out data
investigations in order to detect patterns, irregularities, test the hypothesis and check
conclusions using summary statistics and graphical representations.

The main objective of data analysis’ exploratory phase is to know the important
characteristics of the data by using descriptive statistics, correlation analysis, visual
inspection and other simple modeling and understand it.
CHAPTER 4
SYSTEM DESIGN

A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system.A system architecture can consist of system
components and the sub- systems developed, that will work together to implement the overall
system. There have been efforts to formalize languages to describe system architecture,
collectively these are called architecture description languages (ADLs).

Various organizations can define systems architecture in different ways, including:


The fundamental organization of a system, embodied in its components, their relationships to
each other and to the environment, and the principles governing its design and evolution.

A representation of a system, including a mapping of functionality onto hardware and


software components, a mapping of the software architecture onto the hardware architecture,
and human interaction with these components.

An allocated arrangement of physical elements which provides the design solution for a
consumer product or life-cycle process intended to satisfy the requirements of the functional
architecture and the requirements baseline.

An architecture consists of the most important, pervasive, top-level, strategic inventions,


decisions, and their associated rationales about the overall structure (i.e., essential elements
and their relationships) and associated characteristics and behavior.

A description of the design and contents of a computer system. If documented, it may include
information such as a detailed inventory of current hardware, software and networking
capabilities; a description of long-range plans and priorities for future purchases, and a plan
for upgrading and/or replacing dated equipment and software.
A formal description of a system, or a detailed plan of the system at component level to guide
its implementation.
The composite of the design architectures for products and their life-cycle processes.
The structure of components, their interrelationships, and the principles and guidelines
governing their design and evolution over time.
One can think of system architecture as a set of representations of an existing (or future)
system. These representations initially describe a general, high-level functional organization,
and are progressively refined to more detailed and concrete descriptions.

System Architecture

A system architecture or systems architecture is the conceptual model that defines the
structure, behavior, and more views of a system. An architecture description is a formal
description and representation of a system, organized in a way that supports reasoning about
the structures and behaviors of the system.A system architecture can consist of system
components and the sub- systems developed, that will work together to implement the overall
system. There have been efforts to formalize languages to describe system architecture,
collectively these are called architecture description languages (ADLs).

Various organizations can define systems architecture in different ways, including:


The fundamental organization of a system, embodied in its components, their relationships to
each other and to the environment, and the principles governing its design and evolution.

A representation of a system, including a mapping of functionality onto hardware and


software components, a mapping of the software architecture onto the hardware architecture,
and human interaction with these components.

An allocated arrangement of physical elements which provides the design solution for a
consumer product or life-cycle process intended to satisfy the requirements of the functional
architecture and the requirements baseline.

An architecture consists of the most important, pervasive, top-level, strategic inventions,


decisions, and their associated rationales about the overall structure (i.e., essential elements
and their relationships) and associated characteristics and behavior.
A description of the design and contents of a computer system. If documented, it may include
information such as a detailed inventory of current hardware, software and networking
capabilities; a description of long-range plans and priorities for future purchases, and a plan
for upgrading and/or replacing dated equipment and software.
A formal description of a system, or a detailed plan of the system at component level to guide
its implementation.
The composite of the design architectures for products and their life-cycle processes.
The structure of components, their interrelationships, and the principles and guidelines
governing their design and evolution over time.
One can think of system architecture as a set of representations of an existing (or future)
system. These representations initially describe a general, high-level functional organization,
and are progressively refined to more detailed and concrete descriptions.

System Architecture

Figure 1 : Proposed system architecture.

24
Modules:

Module name :- Dataset Training


Functionality :- A training dataset is a dataset of examples used during the learning process and
is used to fit the parameters. Data Collection is a process of gathering and measuring information on
targeted variables in a systematic way. Formal data collection process is required as it ensures the
data is defined and accurate so that the decisions based on the data are valid. The data required for
the heart disease is the clinical data which vary from each individual.
Input :- Datasets containing activity data , and other health data
Output :- Train the Machine

Module 2
Module name :- Data Pre processing Module
Functionality:- The Preprocessing of genetic data includes the following: Data Transformation
Normalization: scaling the values to a specific range. Aggregation: assigning probabilistic values
to the genes. Construction: replacing or adding new genes inferred by the existing genes
Input:- Datasets
Output:- Searching for a lower dimensional space that can best represent the data. Removing the
irrelevant data from the genome dataset. Sampling can be used to simplify the process of
classification using small dataset.

Module 3
Module name :- Data Synthesization
Functionality:- The collected data were synthesized to remove irrelevant features. For example,
the ID column was irreverent to develop a prediction model, thus it was removed. To handle null
values, list wise deletion technique was applied where a particular observation was deleted if it had
one or more missing values. Then to extract unnecessary features from the dataset, decision tree
algorithm was used.
Input:- Pre processed Data
Output:- Labelled Data

25
Module 4
Module name :- Prediction
Functionality:- With the classified dataset (training dataset) the test data can be predicted for
heart disease. And the corresponding positive and negative predictions with their probabilities are
obtained. To generate prediction of heart disease, algorithms had been developed and their accuracy
was tested. After attaining results from various types of supervised learning like Linear
Input:- Data Input to Algorithms
Output:- Prediction and Classification

26
Data Flow Diagram:

A data flow diagram (DFD) is a graphical representation of the "flow" of data through an
information system, modelling its process aspects. A DFD is often used as a preliminary step
to create an overview of the system without going into great detail, which can later be
elaborated. DFDs can also be used for the visualization of data processing (structured
design).

Figure 6.1:Process flow diagram

A DFD as shown in figure 6.1 shows what kind of information will be input to and output
from the system, how the data will advance through the system, and where the data will be
stored. It does not show information about process timing or whether processes will operate
in sequence or in parallel, unlike a traditional structured flowchart which focuses on control
flow, or a UML activity workflow diagram, which presents both control and data flows as a
unified model.

27
The data is first extracted from the csv file and the dataset is processed. Data
processing consists of of filtering .It involves missing data ,encoding categorical data,
splitting the data into train and test data.
The next step is training the model ,feature selection and dimension reduction.
Learning algorithms are used to classify data. Principal Component Analysis(PCA)
and Random Forest are used for dimension reduction.

Use case Diagram:


The Use Case diagram of the project heart disease prediction using
machine learning consist of all the various aspects a normal use case
diagram requires. This use case diagram shows how from starting
the model flows from one step to another, like he enter into the
system then enters all the information’s and all other general
information along with the symptoms that goes into the system,
compares with the prediction model and if true is predicts the
appropriate results otherwise it shows the details where the user
if gone wrong while entering the information’s and it also shows
the appropriate precautionary measure for the user to follow.
Here the use case diagram of all the entities are linked to each
other where the user gets started with the system

………………, Page 28
………………, Page 29
Sequence Diagram:
The Sequence diagram of the project heart disease prediction using
machine learning consist of all the various aspects a normal sequence
diagram requires. This sequence diagram shows how from starting the
model flows from one step to another, like he enter into the system
then enters all the information’s and all other general information
along with the symptoms that goes into the system, compares with the
prediction model and if true is predicts the appropriate results
otherwise it shows the details where the user if gone wrong while
entering the information’s and it also shows the appropriate
precautionary measure for the user to follow. Here the sequence of all
the entities are linked to each other where the user gets started with
the system.

………………, Page 30
CLASS DIAGRAM:
Heart Disease prediction using machine learning consist of class diagram that all the other
application that consists the basic class diagram, here the class diagram is the basic entity that is
required in order to carry on with the project. Class diagram consist information about all the
classes that is used and all the related datasets, and all the other necessary attributes and their
relationships with other entities, all these information is necessary in order to use the concept of
the prediction, where the user will enter all necessary information such as user name, email,
phone number, and many more attributes that is required in order to login into the system and
using the files concept we will store the information of the users who are registering into the
system and retrieves those information later while logging into the system.

Fig 4.4 Class Diagram

………………, Page 31
Conclusion:
Prediction techniques help to detect Heart Disease , before going to advance stage here we
are use different machine learning algorithm on Heart disease.

………………, Page 32
REFERENCES

[1] Likhitha KN, Nethravathi. R, Nithyashree. K, Rithika Kumari, Sridhar N


and Venkateshvaran K, "Heart Disease Detection using Machine Learning
Technique," 2021 Second International Conference on Electronics and
Sustainable Communication Systems (ICESC), 2021, pp. 1738-1743, doi:
10.1109/ICESC51422.2021.9532705.

[2] Archana Singh and Rakesh Kumar, "Heart Disease Prediction Using
Machine Learning Algorithms," 2020 International Conference on Electrical
and Electronics Engineering (ICE3), 2020, pp. 452-457, doi:
10.1109/ICE348803.2020.9122958.

[3] P. Motarwar, A. Duraphe, G. Suganya and M. Premalatha, "Cognitive


Approach for Heart Disease Prediction using Machine Learning," 2020
International Conference on Emerging Trends in Information Technology
and Engineering (ic-ETITE), 2020, pp. 1-5, doi: 10.1109/ic-
ETITE47903.2020.242.

[4] RahulKatarya and P. Srinivas, "Predicting Heart Disease at Early Stages


using Machine Learning: A Survey," 2020 International Conference on
Electronics and Sustainable Communication Systems (ICESC), 2020, pp. 302-
305, doi: 10.1109/ICESC48915.2020.9155586.

[5] Jian Ping Li, A. U. Haq, S. U. Din, J. Khan, A. Khan and A. Saboor,
"Heart Disease Identification Method Using Machine Learning
Classification in E-Healthcare," in IEEE Access, vol. 8, pp. 107562-107582,
2020, doi: 10.1109/ACCESS.2020.3001149.

………………, Page 33
[6] Shah, D., Patel, S. & Bharti, S.K. Heart Disease Prediction using
Machine Learning Techniques. SN COMPUT. SCI. 1, 345 (2020).
https://doi.org/10.1007/s42979-020-00365-y

[7] B. Keerthi Samhitha, M. R. Sarika Priya., C. Sanjana., S. C. Mana and J.


Jose, "Improving the Accuracy in Prediction of Heart Disease using Machine
Learning Algorithms," 2020 International Conference on Communication
and Signal Processing (ICCSP), 2020, pp. 1326-1330, doi:
10.1109/ICCSP48568.2020.9182303.

[8] ApurbRajdhan ,Avi Agarwal , Milan Sai , Dundigalla Ravi, Dr. Poonam
Ghuli, 2020, Heart Disease Prediction using Machine Learning,
INTERNATIONAL JOURNAL OF ENGINEERING RESEARCH &
TECHNOLOGY (IJERT) Volume 09, Issue 04 (April 2020),

[9] Sameer S Yadav, Shivaji M Jadhav, S. Nagrale and N. Patil, "Application


of Machine Learning for the Detection of Heart Disease," 2020 2nd
International Conference on Innovative Mechanisms for Industry
Applications (ICIMIA), 2020, pp. 165-172, doi:
10.1109/ICIMIA48430.2020.9074954.

[10] Mangesh Limbitote , Dnyaneshwari Mahajan , KedarDamkondwar ,


Pushkar Patil, 2020, A Survey on Prediction Techniques of Heart Disease
using Machine Learning, INTERNATIONAL JOURNAL OF
ENGINEERING RESEARCH & TECHNOLOGY (IJERT) Volume 09,
Issue 06 (June 2020),

[11] Santhana Krishnan J. and Geetha S., "Prediction of Heart Disease Using
Machine Learning Algorithms.," 2019 1st International Conference on
Innovations in Information and Communication Technology (ICIICT), 2019,
pp. 1-5, doi: 10.1109/ICIICT1.2019.8741465.

………………, Page 34
[12] Senthilkumar Mohan, ChandrasegarThirumalai, Gautam Srivastava
―Effective Heart Disease Prediction Using Hybrid Machine Learning
Techniques‖, Digital Object Identifier 10.1109/ACCESS.2019.2923707,
IEEE Access, VOLUME 7, 2019 S.P. Bingulac, ―On the Compatibility of
Adaptive Controllers,‖ Proc. Fourth Ann. Allerton Conf. Circuits and
Systems Theory, pp. 8-16, 1994. (Conference proceedings)

[13] R. Buettner and M. Schunter, "Efficient machine learning based


detection of heart disease," 2019 IEEE International Conference on E-health
Networking, Application & Services (HealthCom), 2019, pp. 1-6, doi:
10.1109/HealthCom46333.2019.9009429.

[14] R. Atallah and A. Al-Mousa, "Heart Disease Detection Using Machine


Learning Majority Voting Ensemble Method," 2019 2nd International
Conference on new Trends in Computing Sciences (ICTCS), 2019, pp. 1-6, doi:
10.1109/ICTCS.2019.8923053.

[15] AnjanNihkilRepaka, Sai DeepakRavikanti and R. G. Franklin, "Design


And Implementing Heart Disease Prediction Using Naives Bayesian," 2019
3rd International Conference on Trends in Electronics and Informatics
(ICOEI), 2019, pp. 292-297, doi: 10.1109/ICOEI.2019.8862604.

[16] Liyaqat Ali, A. Rahman, A. Khan, M. Zhou, A. Javeed and J. A. Khan,


"An Automated Diagnostic System for Heart Disease Prediction Based on $
{\chi^{2}}$ Statistical Model and Optimally Configured Deep Neural
Network," in IEEE Access, vol. 7, pp. 34938-34945, 2019, doi:
10.1109/ACCESS.2019.2904800.

[17] Chandra Shekar K., Chandra P., Venugopala Rao K. (2019) An


Ensemble Classifier Characterized by Genetic Algorithm with Decision
Tree for the Prophecy of Heart Disease. In: Saini H., Sayal R., Govardhan

………………, Page 35
A., Buyya R. (eds) Innovations in Computer Science and Engineering.
Lecture Notes in Networks and Systems, vol 74. Springer, Singapore.
https://doi.org/10.1007/978-981-13-7082-3_2

[18] MontherTarawneh, OssamaEmbarak(2019) Hybrid Approach for


Heart Disease Prediction Using Data Mining Techniques. In: Barolli L.,
Xhafa F., Khan Z., Odhabi H. (eds) Advances in Internet, Data and Web
Technologies. EIDWT 2019. Lecture Notes on Data Engineering and
Communications Technologies, vol 29. Springer, Cham.
https://doi.org/10.1007/978-3-030-12839-5_41

[19] AditiGavhane, GowthamiKokkula, I. Pandya and K. Devadkar,


"Prediction of Heart Disease Using Machine Learning," 2018 Second
International Conference on Electronics, Communication and Aerospace
Technology (ICECA), 2018, pp. 1275-1278, doi:
10.1109/ICECA.2018.8474922.

[20] C. Raju, E. Philipsy, S. Chacko, L. Padma Suresh and S. Deepa Rajan,


"A Survey on Predicting Heart Disease using Data Mining
Techniques," 2018 Conference on Emerging Devices and Smart Systems
(ICEDSS), 2018, pp. 253-255, doi: 10.1109/ICEDSS.2018.8544333.

[21] Abhay Kishore1, Ajay Kumar2, Karan Singh3, Maninder Punia4,


Yogita Hambir5,” Heart Attack Prediction Using Deep Learning”,
International Research Journal of Engineering and Technology (IRJET),
Volume: 05 Issue: 04 | Apr-2018.

[22] S. Pouriyeh, S. Vahid, G. Sannino, G. De Pietro, H. Arabnia and J.


Gutierrez, "A comprehensive investigation and comparison of Machine
Learning Techniques in the domain of heart disease," 2017 IEEE Symposium

………………, Page 36
on Computers and Communications (ISCC), 2017, pp. 204-207, doi:
10.1109/ISCC.2017.8024530.

[23] KanikaPahwa and Ravinder Kumar, "Prediction of heart disease using


hybrid technique for selecting features," 2017 4th IEEE Uttar Pradesh Section
International Conference on Electrical, Computer and Electronics (UPCON),
2017, pp. 500-504, doi: 10.1109/UPCON.2017.8251100.

[24] S. Babu et al., "Heart disease diagnosis using data mining


technique," 2017 International conference of Electronics, Communication
and Aerospace Technology (ICECA), 2017, pp. 750-753, doi:
10.1109/ICECA.2017.8203643.

[25] J. Thomas and R. T. Princy, "Human heart disease prediction system


using data mining techniques," 2016 International Conference on Circuit,
Power and Computing Technologies (ICCPCT), 2016, pp. 1-5, doi:
10.1109/ICCPCT.2016.7530265.

[26] Sonam Nikhar, A.M. Karandikar” Prediction of Heart Disease Using


Machine Learning Algorithms” International Journal of Advanced
Engineering, Management and Science (IJAEMS) InfogainPublication,[Vol-
2, Issue-6, June- 2016].I.S. Jacobs and C.P. Bean, “Fine particles, thin films
and exchange anisotropy,” in Magnetism, vol. III, G.T. Rado and H. Suhl,
Eds. New York: Academic, 1963, pp. 271-350.

[27] M. Sultana, A. Haider and M. S. Uddin, "Analysis of data mining


techniques for heart disease prediction," 2016 3rd International Conference
on Electrical Engineering and Information Communication Technology
(ICEEICT), 2016, pp. 1-5, doi: 10.1109/CEEICT.2016.7873142.

………………, Page 37
[28] Ritika Chadha, Shubhankar Mayank, A. Vardhan, T. Pradhan
(2016). Application of Data Mining Techniques on Heart Disease
Prediction: A Survey. , (), –. doi:10.1007/978-81-322-2553-9_38 

[29] A. Dewan and M. Sharma, "Prediction of heart disease using a hybrid


technique in data mining classification," 2015 2nd International Conference
on Computing for Sustainable Global Development (INDIACom), 2015, pp.
704-706.

[30] M. Gandhi and S. N. Singh, "Predictions in heart disease using


techniques of data mining," 2015 International Conference on Futuristic
Trends on Computational Analysis and Knowledge Management (ABLAZE),
2015, pp. 520-525, doi: 10.1109/ABLAZE.2015.7154917.

[31] T. J. Peter and K. Somasundaram, "An empirical study on prediction of


heart disease using classification data mining techniques," IEEE-International
Conference On Advances In Engineering, Science And Management (ICAESM
-2012), 2012, pp. 514-518.

………………, Page 38

You might also like