Asthma Prediction with Hybrid ML
Asthma Prediction with Hybrid ML
Dissertation IV Report
Submitted in partial fulfilment of the requirements for the award of degree of
Submitted to
SUBMITTED BY
SR.NO. NAME OF STUDENT Prov. Regd. No. BATCH SECTION CONTACT NUMBER
PROPOSED TOPIC : A hybrid technique to predict asthmatic patients during aggravations in asthma
2 Project Feasibility: Project can be timely carried out in-house with low-cost and available resources in 6.13
the University by the students.
3 Project Academic Inputs: Project topic is relevant and makes extensive use of academic inputs in UG 6.25
program and serves as a culminating effort for core study area of the degree program.
4 Project Supervision: Project supervisor’s is technically competent to guide students, resolve any issues, 7.00
and impart necessary skills.
5 Social Applicability: Project work intends to solve a practical problem. 6.13
6 Future Scope: Project has potential to become basis of future research work, publication or patent. 6.13
PAC Member (HOD/Chairperson) Name: Janpreet Singh UID: 11266 Recommended (Y/N): Yes
PAC Member (Allied) Name: Dr.Gurpreet Singh UID: 17671 Recommended (Y/N): Yes
PAC Member 3 Name: Pradeep Kumar UID: 16473 Recommended (Y/N): Yes
Final Topic Approved by PAC: A hybrid technique to predict asthmatic patients during aggravations in asthma
PAC CHAIRPERSON Name: 13714::Dr. Prateek Agrawal Approval Date: 06 May 2023
1
Student Declaration
I, Rajat Rana, 11901295, do hereby declare that the work done by me on “Dissertation IV” under the
supervision of Tarun, Assistant Professor, Lovely Professional University, Phagwara, Punjab, is a
record of original work for the partial fulfilment of the requirements for the award of the Integrated
CSE(Btech+Mtech).
Dated:03/05/2024
2
Declaration by the Supervisor
This is to certify that Rajat Rana, 11901295 of Lovely Professional University, Phagwara, Punjab, has
worked on “Dissertation IV” under my supervision from 01/02/2024 to 03/05/2024. It is further stated
that the work carried out by the student is a record of original work to the best of my knowledge for
the partial fulfilment of the requirements for the award of the Integrated CSE(Btech+Mtech).
Name of Supervisor
Tarun
UID of Supervisor
24044
Signature of Supervisor
3
ACKNOWLEDGEMENT
A project serves as a conduit between theoretical knowledge and practical application, and with this
mindset, I dedicated myself to the project, ensuring its success with the timely support and efforts of
my mentor. I express my gratitude to my teacher and mentor, Tarun, who provided unwavering
support, clarified my doubts, and to my parents, who played a significant role in the finalization of my
project file. I take this moment to acknowledge their invaluable support, and I hope for their continued
encouragement in the future. Throughout the preparation of this project file, the diverse information I
discovered greatly contributed to the project's completion. I am pleased to have successfully finished
the project and gained a deeper understanding of many concepts. The meticulous preparation of this
project was an immense learning experience, fostering the development of personal qualities such as
responsibility, punctuality, confidence, and more.
In conclusion, I extend my thanks to my classmates and friends for their encouragement and assistance
in designing and enhancing the creativity of my project. It was through their support that I was able to
craft an enjoyable and successful project experience.
4
Abstract
The abstract of this paper introduces a novel hybrid approach for predicting asthma exacerbations
through machine learning models. It underscores the significance of personalized medicine and the
role of Big Data analytics in healthcare. The paper's objective is to introduce a model capable of
effectively identifying key features associated with asthma onset, using a specific dataset as an
illustration. The model integrates multiple machine learning algorithms such as K-Nearest Neighbors
(KNN), XGBoost, Decision Tree, and SVC, aiming to enhance predictive accuracy and robustness.
Additionally, the study conducts a review of prior research on machine learning methodologies for
asthma prediction, emphasizing the necessity for improved generalization capabilities and practicality.
The proposed hybrid technique intends to contribute to the progression of predictive healthcare
analytics. Notably, the hybrid model achieves accuracy rates of 96% and 98% with the utilization of
Stacking and voting mechanisms respectively.
5
List of Tables
6
List of Figures
7
List of Equation & Algorithms
Class labels 25
Average value for nearest data points 26
DT for further nodes 27
SVC mathematical equation 31
8
List of Abbreviations
9
Chapter-1
I. Introduction
Machine learning is a broad set of algorithmic models and statistical approaches designed to solve
problems without the need for specialized programming [31]. Some machine learning models,
particularly single-layered ones, involve extensive feature extraction and data processing before the
data is input into the algorithm[1], [2]. Proper data preprocessing is crucial to ensure accurate
predictions and avoid issues such as overfitting or underfitting the training dataset. Deep learning, a
more advanced division of Machine Learning that employs hierarchical artificial neural networks.
achieve higher accuracy and precision, though this may come at the cost of reduced interpretability[3].
In deep learning, neural networks comprise multiple layers that connect artificial neurons or units,
enabling complex data processing. These networks can autonomously learn, recognize patterns, and
extract insights from data through these layered connections until they achieve desired results[4], [5].
Personalized medicine tailors medical decisions, treatments, and technologies to each individual
patient based on their predicted response or disease risk [6]. This approach has gained traction in
recent years owing to advancements in diagnostic techniques and informatics. Big Data analytics,
leveraging various machine learning methods, plays a pivotal role in establishing the analytical
foundation of identified medicine. The growing utilization of computerized algorithms for real-time
estimation of clinical outcomes aims to enhance patient care and reduce costs, supported by Big Data
analytics. Moreover, the expanding accessibility of electronic health data is driving the swift expansion
of predictive analytics applications within the healthcare sector.
10
Asthma is a significant global health issue, affecting about 300 million people and resulting in
approximately 250,000 deaths annually[7]. A major challenge of asthma is the constriction of the
airways, which worsens as the condition becomes more severe and can be expensive to treat.
Evaluating this airway narrowing is essential for diagnosing asthma, monitoring its progression, and
assessing the effectiveness of treatments[8]. Commonly, doctors rely on tests such as spirometry and
body plethysmography, but these require patients to fully cooperate and exert maximum effort. This
can be challenging for older adults, individuals who may struggle to follow instructions, or those with
other serious health conditions. Asthma stands out as one of the most prevalent and serious non-
communicable conditions worldwide[9]. It's a chronic lung condition that impacts the airways, leading
to notable changes in lung function. Recent data from the World Health Organization (WHO) indicates
that around 334 million people worldwide are affected by asthma. Shockingly, in 2016 alone, it
claimed the lives of over 417,918 people globally[10], [11], [12]. Asthma is thought to be caused by a
combination of genetic and environmental factors, including allergies, smoking, weather, air pollution,
and exposure to specific chemicals [13]. Symptoms can differ from one individual to another, but the
most prevalent ones typically encompass shortness of breath, chest tightness or discomfort, disrupted
sleep patterns, persistent coughing, breathlessness, difficulty speaking, sensations of anxiety or panic,
and fatigue [5].
Medical professionals emphasize the importance of accurate diagnosis and prompt detection of life-
threatening illnesses, as these factors can significantly improve a patient's chances of survival and
expedite their recovery[13]. Recently, artificial intelligence (AI) has gained widespread recognition as
a valuable tool in disease detection. It has showed amazing success in diagnosing various health issues
using machine learning and deep learning approaches [14], [15]. Numerous research studies are
presently delving into the capabilities of machine learning and deep learning algorithms for detecting
diseases, such as asthma and pneumonia [16].
The asthmatic lung is exemplified by persistent soreness and increased sensitivity of the airways,
resulting in interrupted occurrences of wheezing, breathlessness, chest tightness, and coughing that
happen repeatedly. [17]. Asthma is a prevalent chronic respiratory condition that can affect individuals
of any age, although it often begins in childhood. Events that may lead to episodes include exposure to
substances that can cause an allergic reaction (such as pollen, dust mites, or pet dander), respiratory
infections, engaging in physical activity, and being exposed to unfriendly air, smoke, or air
contamination can induce tenderness and thinning of the airways in people with asthma. This leads to
the typical symptoms, which can vary in severity and frequency[9], [18].
Inflammation in asthmatic lungs involves a complex interaction of protected cells, such as eosinophils,
mast cells, and T lymphocytes, as well as various inflammatory mediators such as histamine,
11
leukotrienes, and cytokines. This inflammatory response contributes to airway constriction, excessive
mucus production, and heightened airway responsiveness.
Managing asthma aims to control symptoms, prevent flare-ups, and enhance lung function. Treatment
often includes bronchodilators to alleviate acute symptoms and anti-inflammatory medications to
reduce airway inflammation and prevent exacerbations. Alongside medication, avoiding triggers and
adopting a healthy lifestyle are crucial for effective asthma management. Regular examining of
indications and lung function, along with personalized asthma action plans developed with healthcare
providers, can empower individuals with asthma to effectively manage their condition and lead active
lives despite its challenges[19], [20].
In recent years, the global community has grappled with the COVID-19 challenge.[17] One of its
concerning effects is the development of severe pneumonia, often leading to fatalities, particularly
when diagnosed late. Chronic lung ailments have imposed significant strains on healthcare systems.
The period from 2019 to 2021 witnessed a surge in chronic obstructive pulmonary disease (COPD)
cases worldwide due to the COVID-19 outbreak. Timely diagnosis holds the key to mitigating this
issue, as early intervention can effectively manage the condition[17].
There is an increasing use of home-based telemonitoring to monitor and manage chronic health
conditions outside of hospitals. The goal is to optimize the management of these diseases and prevent
worsening. This technology has demonstrated promise in various situations including asthma,
hypertension, provocative gut syndrome, congestive heart failure (CHF), multiple sclerosis, COPD,
and depression. Recent studies have revealed limits in the effectiveness of home telemonitoring
strategies for chronic health disorders [3], [21]. These limitations are often attributed to the absence of
reliable speedy judges and modest implementation of traditional algorithm in identifying worsening
symptoms. Currently, algorithms typically calculate the general hazard of aggravations happening
within a precise time outline, such as one month or one year[22]. They rely on a combination of
medical and invoicing files, but they do not consider changes in disease seriousness over time and day-
to-day alternatives in symptoms[6]. By implementing novel methods that can anticipate impending
exacerbations through individual disease patterns and facilitate prompt detection of potential
worsening before it happens, we could knowingly boost the effectiveness of homegrown
Telemonitoring systems. This advancement holds the potential to elevate the standard of care delivered
to patients and reduce healthcare expenditure.
12
1.3. Introduction to the proposed Hybrid Technique
In the field of medical prediction, hybrid models are becoming increasingly utilized to enhance
accuracy and reliability, particularly in forecasting events such as asthma exacerbations. These models
integrate multiple algorithms or techniques to capitalize on the strengths of each while mitigating
individual weaknesses[8]. The blending of different approaches often leads to superior predictive
performance compared to using any single model alone. Moreover, hybrid models demonstrate greater
resilience to variations in data and environmental factors, making them adaptable to diverse real-world
scenarios. Additionally, these models offer enhanced interpretability, a critical aspect in healthcare
decision-making, by providing insights into the underlying reasoning behind predictions. By
incorporating various feature engineering and selection methods, hybrid models effectively utilize
pertinent data features while reducing noise, resulting in more precise predictions of asthma
13
exacerbations. The ability to accurately predict asthma exacerbations facilitates timely intervention by
healthcare professionals, potentially preventing severe episodes and enhancing patient outcomes.
The availability of large volumes of data and technological improvements have significantly changed
the healthcare sector. Advancements in this field have provided the groundwork for the development
of sophisticated predictive models that can effectively diagnose diseases. Early disease prediction not
only improves patient outcomes but also leads to more effective treatments and reduced healthcare
costs[12]. The objective of this research paper is to investigate the effectiveness of a hybrid algorithm
that integrates elements from four powerful machine learning models: K-nearest neighbors (KNN),
XGBoost, Decision tree, and Support vector classifier (SVC). The KNN algorithm is a simple yet
powerful method that classifies new instances based on their similarity to the nearest instances in the
training dataset[22]. Due to its flexibility and efficiency, XGBoost has gained popularity as a preferred
option for large-scale machine learning tasks, as it effectively implements the gradient boosting
framework. This study compares these four algorithms' abilities to forecast the course of diseases.
[14]. By utilizing a hybrid algorithm, we aim to leverage the unique strengths of each model to
improve the accuracy and reliability of disease prediction. The research will delve into the
methodologies of each model, their application in the context of disease prediction, and a
comprehensive analysis of their performance using various assessment metrics[4]. This comparative
probe will provide valuable perceptions into the most effective machine learning techniques for disease
prediction, contributing to the advancement of predictive healthcare analytics.
The target of this learning is to propose a hybrid approach model that can effectively pinpoint the key
features associated with the onset of asthma in patients. This model will offer predictive capabilities
applicable to diverse datasets, with the dataset utilized in this study serving as a demonstrative
example. Additionally, the study will illustrate how such data-driven insights can inform the
development of intervention strategies and early medical interventions for asthmatic patients. The
evaluation encompasses four algorithms KNN, SVC, XGBoost and Decision Tree.
14
Chapter-2
The books review begins by examining previous research on predictive modeling and machine
learning approaches in the context of chronic health conditions which are mentioned in the table2.1
which is mentioned below:
15
predominantly predictive
Caucasian insights and
populations. enhance
clinical utility.
Another
unavoidable
limitation in
epidemiological
studies is the
lack of a clear
definition for
asthma.
[20] 150 asthma and 52 ANN, SVM, Comparative The study did The researchers
healthy control RF study on not explore the plan to use the
blood sera Predictive possible optimized
samples models challenges in classifier
diagnosing developed in
asthma in this study for
infants or real-time
differentiating asthma risk
it from other prediction and
lung conditions. potentially for
other diseases
as well.
[11] 16000 Naïve Bayes, Predictive
J48, Random
Forest, and
Random Tree
[24] 5,875 patients, Decision Predictive Identifying an The study
including 13,614 trees, logistic unstable event expects further
weekly regression, using a weekly improvement in
surveys and 75,795 naïve Bayes, survey with a performance if
daily surveys and support resolution of objective
vector one week may measures can
machine impact the be used as
(SVM) precision of the feature inputs
16
study. for prediction.
[25] The study used 10 vector Over- Further
endogenous autoregressive dispersion research could
variables: 6 model (VAR), can affect explore robust
atmospheric, 3 DNN consistency variance
meteorological, of estimators for
and asthma coefficient QML Poisson.
occurrence data estimation.
from Seoul, South
Korea.
[16] OASIS dataset has SVM, DT, The study
a dimension of 373 RF, LR found that
rows x 15 columns the random
forest
classifier, a
more
complex
model,
experienced
overfitting.
17
Chapter-3
III. Methodology
To improve the accuracy of asthma prediction, we have incorporated various machine learning
algorithms like SVC, KNN, XGBoost, and Decision Tree into our model. Furthermore, we have
curated a novel disease dataset, which holds significant potential as a cornerstone for forthcoming
researchers and healthcare professionals. The subsequent sections will provide concise insights into the
research materials and methodologies employed in the study. The Behavioral Risk Factor Surveillance
System (BRFSS) has emerged as a potent resource for tailoring and advancing health promotion
efforts through the gathering of behavioral health risk information at both state and local levels.
Consequently, there has been a growing call from BRFSS users for expanded datasets and additional
survey questions to better address their needs. The features or column are listed below in table3.1:
have asthma?
Smoke100 In the past year, have you abstained Numerical 1-9
18
Sex Indicate sex of respondent. Numerical 1-2
as:
HLTHPLN1 Do you have any type of healthcare Numerical 1-9
The comprehensive BRFSS dataset, comprising both landline and cell phone data, is compiled from
submissions for the year 2014. It encompasses information from all 50 territories, the Ward of
Columbia, Guam, and Puerto Rico. There are 464,664 records in the dataset.
In preparing the 2014 BRFSS survey data for analysis, several key steps are involved in preprocessing
the raw responses. Initially, the raw survey data, which is provided in fixed-width column ASCII files
and accessible from the CDC website, is downloaded and extracted. The structure of the data,
including column positions and group fields, is defined in a layout format. To facilitate parsing, a data
frame is constructed to remove group fields and accurately calculate the widths of individual fields.
Subsequently, the data undergoes categorization based on specific parameters, aiming to segment it
into meaningful groups. This process may entail combining multiple values to create new parameters
that enhance the interpretability of the data. Moreover, data cleaning procedures are implemented to
19
tackle concerns such as ignoring estimates, outliers, and conflicts. By handling these anomalies, the
quality and reliability of the data are improved, ensuring its suitability for subsequent analysis.
From the above fig.3.1 the data will divide into subcategories according to its metadata. For example,
the drinking has four variables which are Nondrinking, drink monthly, drink weekly and these
variables are appended with the asthma4 column for the separation of the drinker’s category in the
dataset.
After removing the null or noisy data here is the cleaned data which contains the 4,64,664 entries in it.
20
Fig.3.3. Code for the subcategory of the dataset.
This code clean or make the separate groups of the data according to its name mentioned in this code.
The code generated the CSV file named as Cleaned_data.csv which have the 30,410 entries in it which
we will use for the further in the machine learning models. The fig represents the columns name and
data in it which is numerical or categorical presented in it.
The features are selected from the fig such as VETERAN3, ALCDAY5, SLEPTM1 etc. The required
information about the features is in Table.1.
21
3.4. Hybrid Technique Implementation:
A hybrid machine learning model is a combination of multiple machine learning models, each
contributing to the final prediction. In your case, you're considering K-Nearest Neighbors (KNN),
Decision Tree, XGBoost and SVC. Here's a high-level overview of how this could work:
1. Individual Model Training: Each of the three models (KNN, Decision Tree, and XGBoost,
SVC) is conducted separately on the guiding data. In this phase, we will tune the
hyperparameters of each version to achieve the best performance.
2. Model Evaluation: Using suitable measures like accuracy, precision, recall, and F1-score on
the validation set, we will assess each model's performance when training is complete. This
method makes it easier to evaluate each model's performance separately.
3. Hybrid Model Creation: In this step, we’ll develop a hybrid model that capitalizes on the
strengths of the individual models. There are several ways to do this:
Voting: Each model makes a estimate for each instance, and the absolute expectation is
decided by common vote[8]. This method works well when the models are largely
uncorrelated.
Stacking: In this approach, another machine learning model takes the likelihoods of the
individual models as input and learns to make the final prediction. This "meta-model"
can be any model, but it's often a simple one like linear regression.
4. Hybrid Model Assessment: Ultimately, we will calculate the running of the hybrid model
using the same metrics as before. If the hybrid model is effective, it should perform better than
any individual model.
KNN:
The K-Nearest Neighbors (KNN) algorithm is widely recognized and easy to understand in the
field of machine learning for asthma prediction. It is a versatile approach that relies on the concept of
similarity. KNN classifies a data point by determining the majority class among its closest neighbors in
the feature space [4]. In the context of asthma prediction, KNN analyzes various attributes and patterns
22
within the dataset to identify similarities between instances, aiding in the classification of asthma
likelihood.
By utilizing historical data containing factors such as demographic details, environmental exposures,
genetic predispositions, and medical history, KNN effectively identifies individuals at risk of
developing asthma[26]. The K-Nearest Neighbors (KNN) algorithm is highly valuable due to its
straightforwardness and clarity. It does not depend on assumptions about the distribution of the data
and can easily adapt to diverse datasets with various types of features.
Moreover, the flexibility of KNN allows it to accommodate dynamic changes in asthma risk factors
over time, making it a robust contender for real-time prediction and monitoring applications[27]. Its
non-parametric nature also enables it to capture complex relationships and nonlinear interactions
among predictors, ensuring comprehensive and accurate predictions.
However, like any machine learning method, KNN presents certain considerations. The computational
overhead associated with evaluating distances between data points can be significant, especially in
large datasets. Additionally, appropriate feature scaling and careful selection of the K parameter
(number of neighbors) are crucial to optimize performance and minimize potential biases[17], [28].
Despite these challenges, the KNN algorithm remains a valuable tool in the arsenal of asthma
prediction models, offering a straightforward yet effective approach to identifying individuals at risk
and informing targeted intervention strategies. Through its reliance on proximity-based classification,
KNN contributes to advancing personalized medicine initiatives by facilitating early detection and
proactive management of asthma, ultimately enhancing patient outcomes and quality of life.
Implementation of KNN:
23
KNN (K-Nearest Neighbors) is one of the base models used for classification. This is the break down
how KNN is performing in this model building process:
2. Model Training: Once the KNN classifier is initialized, it undergoes training using the scaled
training data (X_train_scale, y_train), alongside other base models.
3. Prediction: Once trained, the KNN classifier expects the objective adjustable for the scaled
test data (X_test_scale) using the predict method. Additionally, it calculates the possibilities of
fitting to each class using the `predict_proba` method. This is important for computing ROC
AUC score.
4. Evaluation Metrics:
Accuracy: The accurateness of KNN model predictions on the test set is printed.
ROCAUC Score: The ROCAUC score is calculated. This measured evaluates the
capability of the classifier to distinguish between classes and is based on the true
positive rate and false positive rate.
Confusion Matrix: The program will display a confusion matrix indicating TP`, TN`,
FP`, and FN` predictions.
Classification Report: A thorough classification report with information on precision,
recall, F1-score, and support will be produced by it.
5. Learning Curve Plot: A learning curve is used to show how well the KNN model performs in
order to determine whether overfitting or underfitting is present. The curve displays the
fluctuations in training and cross-validation scores as the size of the training dataset changes.
This approach assists in comprehending the model's tendency towards either overfitting or
underfitting.
Being a non-parametric technique, the K-Nearest Neighbours (KNN) classifier does not rely on any
predetermined assumptions about the data distribution. KNN adopts a statistical technique wherein it
utilizes distance functions to categorize unknown samples based on their proximity to a set of known k
samples[4]. KNN works by identifying the closest k neighbors to the input values provided and their
respective classifications. It then determines the most common classification among these neighbors as
the final output for the given input. In our training process using the dataset, we have set the value of
the k parameter to 5. The K nearest neighbors (KNN) algorithm can be expressed mathematically as
follows:
24
1. We calculate the distances between each input point and each data point in the dataset using a
chosen distance metric, such as the Euclidean distance, in order to identify the K closest neighbours for
a given input point.
2. By sorting the voids in scaling order, we select the K information spots with the miniature gaps as
the nearest neighbors.
3. In classification tasks, the predicted class label for an input point is determined by selecting the
majority class label among its nearest neighbors. Conversely, in regression tasks, the predicted target
value for an input point is calculated by averaging the target values of its nearest neighbors.
Given:
- y_i: group label or target value associated with the ith data point
- ( d(X, Di) \): represents the distance between the input point \( X \) and the \( i \)th data point \( D_i \)
1. Calculate the distances between the input point X and each data point D_i in the dataset:
2. Sort the gaps in ascending order and select the K closest data locations:
K_nearest = {D_i1, D_i2, ..., D_iK}, where d (X, D_i1) ≤ d(X, D_i2) ≤ ... ≤ d(X, D_iK)
3. Determine the most common class label among the K closest data points:
𝑦_𝑝𝑟𝑒𝑑 = 𝑎𝑟𝑔𝑚𝑎𝑥(𝛴[𝑦_𝑖 = 𝑐]
1. Calculate the distances between the input point X and each data point D_i in the dataset:
25
d(X, D1), d(X, D2), ..., d(X, Dm)
2. Arrange the distances in increasing order and choose the K nearest data points:
K_nearest = {Di1, Di2, ..., D_iK}, where d(X, Di1) ≤ d(X, Di2) ≤ ... ≤ d(X, DiK)
3. Calculate the average target value among the K nearest data points:
Several widely-used algorithms such as ID3 and CART make use of Decision Trees, a non-parametric
supervised learning technique primarily employed for classification tasks. Its fundamental structure
consists of a root node and several leaf nodes. During classification, the algorithm begins at the root
and traverses to a leaf node. Decision trees find applications in decision analysis research, with
implementations showing promise in accurately predicting asthma disease. The core objective involves
constructing a model that predicts a target variable's value based on learned data features using
straightforward decision rules[2], [29].
Once the decision tree is built, it can be used to forecast asthma status for new patients by applying a
set of criteria from the root node to the leaf nodes [1], [24]. At each internal node, the decision tree
evaluates the patient's feature values and proceeds down the appropriate branch based on the decision
rules. Ultimately, the leaf node reached predicts the patient's asthma status: asthma-positive or asthma-
negative[29].
The prediction accuracy of decision trees can vary depending on considerations such as the dataset's
quality and completeness, the selection of features, and the decision tree's complexity. However,
decision trees offer several advantages in asthma prediction[14], [30]. They are easy to interpret, as the
decision rules are represented graphically and, in a human, -readable format. They can handle mixed
data types and missing values, making them applicable to real-world medical datasets. Decision trees
can also account for interactions between features, allowing for more accurate predictions.
The decision tree algorithm is formulated to create a set of rules by segmenting the data using input
features to classify the target variable. In our scenario, we're addressing a binary classification
problem, where the target variable can take on two values: 0 for asthma-negative and 1 for asthma-
positive. The decision tree model can be depicted as a sequence of if-else statements. Each internal
26
node of the tree signifies a split on a feature, while each leaf node signifies a prediction (0 or 1). Below
is a typical representation of the decision tree formula:
} else {
} else {
In this condition3.1, "feature1" and "feature2" represent features from the input data, and "threshold1"
and "threshold2" represent thresholds to split the data on those features. The if-else statements evaluate
the feature values and decide which branch (left or right) to take based on the thresholds. At each leaf
node, a prediction is made for the target variable (0 or 1). The actual formula may be more complex
and may involve multiple features and splits depending on the specific decision tree being used.
Additionally, decision trees can be utilized to deal with multi-class grouping or regression problems,
which would require different mathematical formulations.
It's crucial to mention that the decision tree algorithm utilizes impurity measures, such as the Gini
Index or Information Gain, to identify the optimal feature and threshold for splitting the data at each
node. These measures help in finding the splits that maximize the separation between different classes
or minimize the impurity within each class. Overall, the mathematical formula for a decision tree
involves evaluating feature values and splitting the data based on rules defined by thresholds,
eventually leading to predictions at each leaf node.
The Gini Index = 0.301for the dataset which is trained on the decision tree model.
Features Values
ALCDAY5 Information Gain = 0.028650825128447473
SLEPTIM1 Information Gain = 0.013936882614491629
X_AGE_G Information Gain = 0.006847519607620646
27
SMOKDAY2 Information Gain = 0.006960442110794853
SEX Information Gain = 0.0004739918775396071
X_HISPANC Information Gain = 0.0025293500968855574
X_MRACE1 Information Gain = 0.008850538074078308
MARITAL Information Gain = 0.014924287194574969
GENHLTH Information Gain = 0.014629738787207778
HLTHPLN1 Information Gain = 6.570035720693993e-05
EDUCA Information Gain = 0.012295452134480305
INCOME2 Information Gain = 0.02102649276657655
X_BMI5CAT Information Gain = 0.011768902179923593
EXERANY2 Information Gain = 0.007876172653369179
ALCGRP Information Gain = 0.002625711646314724
DRKWEEKLY Information Gain = 0.0011139398458448837
ASTHMA4 Information Gain = 0.8340980145040057
AGE2 Information Gain = 0.0013888986102831005
AGE3 Information Gain = 0.00034468189068495765
AGE4 Information Gain = 0.001742967422482746
AGE5 Information Gain = 0.004107403649051549
AGE6 Information Gain = 0.0037420868481351094
3.4.3. XGBoost:
Extreme gradient boosting is a ml algorithm that combines the principles of gradient boosting and
decision trees [8]. It is extensively used for various Classification and Regression tasks, including
detecting medical conditions like asthma in patients.
The XGBoost algorithm constructs a sequence of weak decision tree models, each designed to rectify
errors made by previous trees and enhance the overall predictive accuracy. It places special emphasis
on problematic examples that were difficult to classify accurately.
XGBoost, an ensemble tree method, employs a gradient descent framework to enhance the
performance of weak learners. To mitigate overfitting, XGBoost incorporates a regularization term into
its loss function, thereby smoothing out learned weights. The model's output, denoted as y-hat, is
computed as the average of the outputs of individual trees. A loss function, which quantifies the
difference in predicted and real values, as well as a regularization function to manage model
complexity and prevent overestimation are combined into XGBoost's Objective Function.
1. Objective Function:
Objective Function = Loss Function + Regularization Term
2. Loss Function:
Loss Function = 𝑠𝑢𝑚(𝑦 ∗ log(𝑝`) + (1 − 𝑦) ∗ log(1 − 𝑝`))
29
3. Regularization Term:
Regularization Term = 𝑎𝑙𝑝ℎ𝑎 ∗ 𝐿1 + 0.5 ∗ 𝑙𝑎𝑚𝑏𝑑𝑎 ∗ 𝐿22
4. Prediction Function:
Prediction = 𝑠𝑢𝑚(𝐹𝑚(𝑋) )
5. Gradient:
𝑑𝐿(𝑦,𝐹)
Gradient = − 𝑑𝐹
6. Ensemble Prediction:
7. Final Prediction = 𝑠𝑢𝑚 (𝑒𝑡𝑎 ∗ 𝐹𝑚(𝑋) )
3.4.4. SVC:
The SVC (Support Vector Classifier) technique is a supervised machine learning technique utilized for
data classification. It belongs to the Support Vector Machine (SVM) family and is well-known for its
ability to find the best hyperplane in a high-dimensional feature space to distinguish between classes
[15], [22], [32]. The primary goal of SVC is to discover the decision boundary that maximises the
separation of classes. This is performed by setting a distance between them and employing support
vectors, which are the data points closest to the decision boundary [32]. To manage both linearly and
non-linearly separable data, SVC employs kernels such as linear, polynomial, or radial basis functions
to map the data into higher-dimensional spaces. SVC can be used to predict asthma in patients by
training the algorithm on a dataset that includes features related to asthma and corresponding labels
indicating whether a patient has asthma or not [33].
30
After the completion of training, the model's efficacy is assessed by employing the testing set.
Utilizing the patients' characteristics, the model predicts whether they have asthma. The predicted
labels are subsequently compared to the actual labels to compute various performance metrics, such as
accuracy, precision, recall, and F1 score [24], [34].
The performance metrics enable us to gauge the accuracy of the SVC model in predicting asthma
among patients. If the model performs well, it can be applied to make predictions on new patient data
that it hasn't seen before. However, it is crucial to consider the quality and relevance of the data used to
train the model, as it significantly influences the accuracy and effectiveness of predictions. To improve
the model's performance, feature selection and data preprocessing techniques may be required.[4].
The mathematical formulation of the SVM algorithm, particularly for the SVC, revolves around
optimizing a hyperplane capable of categorizing data points into their respective classes. With a
labeled training dataset containing input features (𝑋) and corresponding labels (𝑦), SVC aims to
discover the hyperplane characterized by a vector (𝑤) and bias (𝑏) that maximizes the margin between
the classes. The mathematical formula for the SVC's decision function is:
𝑓(𝑥 ) = 𝑠𝑖𝑔𝑛(𝑤 𝑇 ∗ 𝑥 + 𝑏)
where:
The function f(x) is used to make predictions on the class of a given input data point x.
w is a vector perpendicular to the hyperplane, and its direction determines the decision
boundary.
x is the input attribute vector.
b is the preference term that adjusts the position of the decision boundary.
A hybrid machine learning strategy integrates multiple machine learning algorithms or approaches to
address a particular problem. Instead of relying on a single algorithm, a hybrid approach leverages the
strengths of different algorithms to improve overall performance, accuracy, or efficiency.
There are various ways to create a hybrid approach in machine learning. Here are a few common
techniques such as Ensemble Method, Feature Combination, Model Stacking[8], [22], [35]. Computer
31
vision, natural language processing, recommendation systems, and anomaly detection are all examples
of fields where hybrid machine learning approaches may be applied. By combining different
algorithms or techniques, hybrid approaches can often achieve better performance, handle complex
problems, and provide more robust solutions. However, designing and implementing a hybrid
approach expects cautious consideration of the specific problem, data, algorithms, and their
interactions[22].
The hybrid approach we are using, called stacking and voting, combines the predictions of different
machine learning algorithms, including KNN, Decision Tree, XGBoost, and SVC.
Stacking is an Ensemble technique where multiple base models are prepared on the same dataset. The
32
predictions made by these models are then used as input features for a final meta-model. In this case,
the base models include KNN, Decision Tree, XGBoost, and SVC classifiers. Each of these models
learns to forecast based on incoming data and generates its own set of predictions.
After training the base models, their predictions are merged using a meta-model. In our scenario, the
meta-model can be a Voting Classifier. A Voting Classifier is an Ensemble model that aggregates the
predictions of multiple models through voting to determine the final prediction. Various voting
methods can be employed, including majority polling, where the class with the highest number of
votes is chosen, or weighted voting, where each model's prediction is weighted based on its
performance or certainty level. The Voting Classifier integrates the predictions from the KNN,
Decision Tree, XGBoost, and SVC classifiers to generate a conclusive prediction.
By combining the predictions of different classifiers, the stacking and voting approach leverages the
strengths of each model. For example, KNN is a lazy learner that makes predictions based on the
closest neighbors in the training data. Decision Tree is a non-parametric model that can capture
complex relationships between features. XGBoost is a gradient boosting algorithm that creates a strong
predictive model by combining weak models. SVC is a supervised learning method that separates
classes using hyperplanes in a high-dimensional space.
In the fig.10 stacking and voting hybrid approach can be beneficial in improving the overall prediction
accuracy, handling diverse or complex datasets, and handling different types of problems. However, it
is crucial to carefully select and tune the base models, meta-model, and voting strategy to ensure
compatibility and optimal performance. It is also important to handle potential issues such as model
complexity, overfitting, and class imbalances in the dataset. Overall, the stacking and voting approach
allows you to leverage the strengths of different machine learning algorithms to obtain robust
predictions for your specific problem.
33
Chapter-4
IV. Results
We assessed the performance of the models using test data to ascertain their accuracies. This included
showing the training and testing accuracies for all models. We also assessed each model's AUC Score,
Specificity, and Sensitivity, which were obtained using the confusion matrix.
The experimental outcomes demonstrate the efficacy of different machine learning algorithms in
predicting asthma prevalence using the provided dataset. Four main algorithms were evaluated: KNN,
SVC, Decision Tree, and XGBoost. Additionally, ensemble methods such as Voting and Stacking were
utilized to harness the combined strengths of these base models.
Across all models, the accuracy scores were notably high, with SVC, XGBoost, and Voting achieving
accuracies above 98%. This indicates a robust predictive capability of the models in identifying asthma
cases. Moreover, the area under the ROC curve values ranged from approximately 0.86 to 0.95, further
affirming the models' effectiveness in distinguishing between asthma and non-asthma instances. In
terms of individual model performances, SVC and XGBoost emerged as top performers, consistently
exhibiting high accuracy and ROC AUC scores.[36] These models demonstrated precision and recall
rates exceeding 0.90 for both asthma and non-asthma classes, indicative of their ability to correctly
classify instances across various metrics. The Decision Tree model, while slightly lower in accuracy
compared to SVC and XGBoost, still provided a respectable performance, achieving an accuracy of
around 96% and an ROC AUC score of approximately 0.90. However, it displayed relatively lower
precision and recall rates for the minority class (asthma instances) compared to the majority class.
Ensemble methods, namely Voting and Stacking, showcased competitive performance, with accuracy
and ROC AUC scores like or slightly lower than those of individual base models. This suggests that
while ensemble methods can offer enhanced predictive capabilities through model aggregation, they
may not always outperform individual strong learners.
34
The table for the Confusion matrix for the algorithms is below Table4:
The following evaluation metrics or parameters are evaluated, and the accuracy of the model and
algorithm are given in the Table.5:
1. Accuracy: Accuracy is the fraction of accurately predicted instances among all instances in a
dataset.
Formula: Accuracy = (TP` + TN`) / (TP` + TN` + FP` + FN`)
Model Accuracy
KNN 0.979944
SVC 0.982903
XGBoost 0.98241
Voting 0.982081
Stacking 0.960217
35
2. Precision: Precision measures the ratio of TP` forecasts to all (+) predictions made by the
model.
Formula: Precision = TP` / (TP` + FP`)
PRECISION
0.985 SVC, 98.27% XGBoost, 98.22%
Voting, 98.18%
KNN, 97.96%
0.98
0.975
0.97
0.965
Stacking, 96.33%
0.96 Decision Tree,
96.20%
0.955
0.95
KNN SVC DECISION XGBOOST VOTING STACKING
TREE
3. Recall (Sensitivity): Remember the ratio of TP' projections to entirely actual (+) cases in the
dataset.
Formula: Recall = TP / (TP + FN)
Recall
Stacking 96.28%
Voting 98.21%
XGBoost 98.24%
SVC 98.29%
KNN 97.99%
Recall
36
4. F1 Score: F1 Score represents the HM of precision and recall, offering a balanced assessment.
Formula: F1-score = 2 * (P * R) / (P + R)
F1-score
0.985 98.28% 98.23% 98.19%
97.96%
0.98
0.975
0.97
0.96
0.955
0.95
KNN SVC Decision XGBoost Voting Stacking
Tree
F1-score
5. ROCAUC curve: The ROC Curve illustrates the TP` rate versus the FP` rate at different
thresholds, while the AUC quantifies the area under this curve.
Formula: AUC is computed by integrating the area under the ROC Curve.
37
Fig. 4.5. Learning Curve of DT
We trained and tested KNN, SVC, DT, XGBoost, and ensemble models (stacking and voting) on a
dataset containing patient information. System of measurement such as accuracy, ROCAUC, P, R, F1-
score, MSE, and CM were computed for every model presented in the tables and diagrams. The
learning curves for the ensemble models demonstrated that both stacking and voting approaches
improved performance as the number of training samples increased. The voting ensemble model
showed a higher training score and a lower cross-validation score gap compared to the stacking
ensemble, indicating better generalization.
38
Fig. 4.7. Learning Curve of SVC Fig. 4.8. Learning Curve of Stacking
39
The ROC AUC curves indicated that the voting ensemble model surpassed both the stacking
ensemble and individual base models in terms of ROCAUC. Specifically, the voting
ensemble attained the highest ROC AUC score, followed by the stacking ensemble,
XGBoost, SVC, Decision Tree, and KNN.
4.3. Discussion:
Selecting the most suitable classification techniques for a health dataset necessitates
preprocessing and a deep understanding of the data. Numerous classifiers exist for analyzing
healthcare data, with the choice depending on the intended analysis. Cleaning large amounts
of healthcare data may take a long time and be expensive. The models employed in this study
are applicable to individual healthcare clinics or can be scaled up for broader use. There is a
scarcity of prior research utilizing machine learning to examine paediatric health data
concerning asthma development. Timely identification of asthma in children is imperative for
early implementation of interventions for this chronic respiratory condition.
Several research studies have explored the utilization of machine learning (ML) in asthma
identification. These studies assessed various techniques using metrics like accuracy, recall,
and AUCROC. The main aim is to identify the ML method that offers the most accurate
predictions and effectively detects genuine asthma cases (recall). AUCROC provides a
detailed assessment of the model's performance at various categorization criteria.
40
RF Not available 0.84 0.88
KNN Not available 0.84 0.88
ANN Not available 0.87 0.90
14 CNN 0.98 Not available Not available
17 SVM 0.94 Not available Not available
ANN 0.92 Not available Not available
RF 0.92 Not available Not available
33 Neural Network 0.68 Not available Not available
XGB 0.728 Not available Not available
LGBM 0.721 Not available Not available
KNN 0.70 Not available Not available
36 KNN 0.66 0.62 0.93
DT 0.86 0.86 0.91
XGB 0.93 0.92 0.98
SVC 1 1 1
39 SVM 0.88 Not available Not available
GBM 0.90 Not available Not available
XGB 0.86 Not available Not available
In the table7 the models are evaluated with the evaluation metrics which are mentioned
below:
41
The provided table outlines the execution metrics of various machine learning algorithms and
ensemble techniques on a specific dataset:
Overall, the findings indicate that SVC and XGB algorithms demonstrate notable accuracy
and AUCROC performance, while KNN excels in terms of F1-score and sensitivity.
Ensemble techniques like Voting and Stacking also showcase competitive performance, albeit
marginally lower than individual algorithms. These results furnish valuable insights for
selecting the most appropriate algorithm or ensemble approach tailored to the specific
classification task.
42
Chapter-5
V. Future Direction
Using machine learning techniques shows promise for predicting asthma exacerbations with
high accuracy, but there's still a need to explore more methods in the future. While several
models have been developed, only a few have been put into practical use. Hence, it's crucial
to enhance the adaptability of prediction models across various large datasets. Practicality is
key here. Simplifying models with just a handful of predictors, using easily accessible data,
could make them more practical. Additionally, integrating machine learning algorithms into
user-friendly software or systems would facilitate their transition from research to practical
applications. Furthermore, randomised control trials are needed to determine whether these
models truly assist asthma patients by preventing exacerbations. There's potential in creating
models capable of predicting asthma exacerbations during a child's clinical visit, offering
real-time insights for clinicians. It would be highly beneficial if such models could provide
predictions for children under the age of two, enabling early intervention with appropriate
medications. Moreover, exploring various classifiers like artificial neural networks,
multilayer perception, and linear regression could provide valuable insights into prediction
probabilities.
The next generation of healthcare providers must acquire a solid understanding of the
fundamentals, potentials, and terminology related to machine learning (ML) due to its rapid
advancements and growing use in healthcare. Understanding ML algorithms and related
terms will empower them to comprehend and analyze relevant literature, as well as engage in
research utilizing ML techniques[6], [11]. It's essential to educate professionals across
various healthcare domains, including public health, epidemiology, clinical practice,
pathology, and radiology, about ML terms. Given the interconnectedness of data science and
epidemiology, it's equally crucial to train public health professionals who possess a strong
grasp of epidemiological concepts. Incorporating certain ML and data science concepts into
the medical curriculum in the long run is recommended to ensure that future healthcare
professionals are well-equipped to leverage the potential of ML in improving healthcare
outcomes[5].
43
5.2. Conclusion:
The initial discovery of asthma patient role at risk of experiencing aggravations is crucial for
providing timely intervention and close monitoring. This examine highlights the effectiveness
of machine learning in predicting asthma exacerbations. However, it is important for future
research to focus on improving the applicability and generalizability of these models, making
them more suitable for integration into clinical practice. Asthma is a complex condition with
numerous risk factors, making it particularly challenging to diagnose in children under the
age of six. Machine learning presents a promising approach to developing predictive models
for childhood asthma by utilizing large datasets and outperforming traditional regression
methods. These studies exhibited differences in their definitions of asthma, preferred
populations, predictors considered, age of prediction, feature selection methods, and ml
algorithms employed, thus introducing a potential risk of bias. Although these studies
achieved great precision, there were indications of overfitting due to small sample sizes.
Additionally, none of the studies externally validated their models, further undermining their
reliability. These limitations highlight the necessity for future research to focus on enhancing
the accuracy and applicability of machine learning models for predicting childhood asthma.
By addressing these challenges, machine learning has the potential to significantly improve
the early detection and management of asthma in children. Compared to alternative
classifiers, the suggested model demonstrates significant enhancements in recall, precision,
and accuracy metrics, especially for the poorly-controlled class. Moreover, the findings
suggest that ensemble learning, and similar machine learning approaches hold significant
promise for integrating prognostic systems in identifying asthma control levels, especially
when supplemented by medical expertise.
44
References:
[2] Ambekar, S., & Phalnikar, R. (2018, August). Disease risk prediction by using
convolutional neural network. In 2018 Fourth international conference on computing
communication control and automation (ICCUBEA) (pp. 1-5). IEEE.
[4] A. Alanazi, “Using machine learning for healthcare challenges and opportunities,”
Informatics in Medicine Unlocked, vol. 30. Elsevier Ltd, Jan. 01, 2022. doi:
10.1016/j.imu.2022.100924.
[5] J. Finkelstein and I. cheol Jeong, “Machine learning approaches to personalize early
prediction of asthma exacerbations,” Ann N Y Acad Sci, vol. 1387, no. 1, pp. 153–165,
Jan. 2017, doi: 10.1111/nyas.13218.
[6] A. Yahyaoui and N. Yumusak, “Deep and Machine Learning towards Pneumonia and
Asthma Detection,” in 2021 International Conference on Innovation and Intelligence
for Informatics, Computing, and Technologies, 3ICT 2021, Institute of Electrical and
Electronics Engineers Inc., Sep. 2021, pp. 494–497. doi:
10.1109/3ICT53449.2021.9581963.
[8] M. S. Kim, J. H. Lee, Y. J. Jang, C. H. Lee, J. H. Choi, and T. E. Sung, “Hybrid deep
learning algorithm with open innovation perspective: A prediction model of asthmatic
occurrence,” Sustainability (Switzerland), vol. 12, no. 15, Aug. 2020, doi:
10.3390/su12156143.
45
[9] L. S. Becirovic, A. Deumic, L. G. Pokvic, and A. Badnjevic, “Aritificial Inteligence
Challenges in COPD management: A review,” in BIBE 2021 - 21st IEEE International
Conference on BioInformatics and BioEngineering, Proceedings, Institute of Electrical
and Electronics Engineers Inc., 2021. doi: 10.1109/BIBE52308.2021.9635374.
[11] AKBAR, W., WU, W. P., FAHEEM, M., SALEEM, M. A., GOLILARZ, N. A., &
HAQ, A. U. (2019, December). Machine learning classifiers for asthma disease
prediction: a practical illustration. In 2019 16th International Computer Conference on
Wavelet Active Media Technology and Information Processing (pp. 143-148). IEEE.
[12] Harvey, J. L., & Kumar, S. A. (2019, December). Machine learning for predicting
development of asthma in children. In 2019 IEEE Symposium Series on
Computational Intelligence (SSCI) (pp. 596-603). IEEE.
[17] P. D. Terry, R. E. Heidel, and R. Dhand, “Asthma in adult patients with covid-19
prevalence and risk of severe disease,” American Journal of Respiratory and Critical
46
Care Medicine, vol. 203, no. 7. American Thoracic Society, pp. 893–905, Apr. 01,
2021. doi: 10.1164/rccm.202008-3266OC.
[18] A. L. Yadav, K. Soni, and S. Khare, “Heart Diseases Prediction using Machine
Learning,” in 2023 14th International Conference on Computing Communication and
Networking Technologies, ICCCNT 2023, Institute of Electrical and Electronics
Engineers Inc., 2023. doi: 10.1109/ICCCNT56998.2023.10306469.
[19] S. Xiong, W. Chen, X. Jia, Y. Jia, and C. Liu, “Machine learning for prediction of
asthma exacerbations among asthmatic patients: a systematic review and meta-
analysis,” BMC Pulm Med, vol. 23, no. 1, Dec. 2023, doi: 10.1186/s12890-023-02570-
w.
[23] A. M. Pescatore et al., “A simple asthma prediction tool for preschool children with
wheeze or cough,” Journal of Allergy and Clinical Immunology, vol. 133, no. 1, 2014,
doi: 10.1016/j.jaci.2013.06.002.
[24] Tsang, K. C., Pinnock, H., Wilson, A. M., & Shah, S. A. (2020, July). Application of
machine learning to support self-management of asthma with mHealth. In 2020 42nd
annual international conference of the IEEE engineering in medicine & biology society
(EMBC) (pp. 5673-5677). IEEE.
[25] M. S. Kim, J. H. Lee, Y. J. Jang, C. H. Lee, J. H. Choi, and T. E. Sung, “Hybrid deep
learning algorithm with open innovation perspective: A prediction model of asthmatic
47
occurrence,” Sustainability (Switzerland), vol. 12, no. 15, Aug. 2020, doi:
10.3390/su12156143.
[26] J. Finkelstein and I. cheol Jeong, “Machine learning approaches to personalize early
prediction of asthma exacerbations,” Ann N Y Acad Sci, vol. 1387, no. 1, pp. 153–165,
Jan. 2017, doi: 10.1111/nyas.13218.
[31] M. A. Awal et al., “An Early Detection of Asthma Using BOMLA Detector,” IEEE
Access, vol. 9, pp. 58403–58420, 2021, doi: 10.1109/ACCESS.2021.3073086.
[32] M. Payal, T. Ananth Kumar, and S. A. Ajagbe, “Support Vector Machines (SVMS)
Based Advanced Healthcare System Using Machine Learning Techniques International
Journal of Innovative Research in Computer and Communication Engineering
IJIRCCE©2022 | An ISO 9001:2008 Certified Journal | 3007 Support Vector Machines
(SVMS) Based Advanced Healthcare System Using Machine Learning Techniques”,
doi: 10.15680/IJIRCCE.2022.1005020.
[33] Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019, May). A brief review of
nearest neighbor algorithm for learning and classification. In 2019 international
conference on intelligent computing and control systems (ICCS) (pp. 1255-1260).
IEEE.
48
[34] M. Pyingkodi et al., “Asthma Disease Risk Prediction Using Machine Learning
Techniques,” in 2023 International Conference on Computer Communication and
Informatics, ICCCI 2023, Institute of Electrical and Electronics Engineers Inc., 2023.
doi: 10.1109/ICCCI56745.2023.10128635.
[35] S. Bharati, P. Podder, and M. R. H. Mondal, “Hybrid deep learning for detecting lung
diseases from X-ray images,” Inform Med Unlocked, vol. 20, Jan. 2020, doi:
10.1016/j.imu.2020.100391.
[36] R. Khasha, M. M. Sepehri, and S. A. Mahdaviani, “An ensemble learning method for
asthma control level detection with leveraging medical knowledge-based classifier and
supervised learning,” J Med Syst, vol. 43, no. 6, Jun. 2019, doi: 10.1007/s10916-019-
1259-8.
49
Publication Details
50
Plagiarism report4
ORIGINALITY REPORT
9 %
SIMILARITY INDEX
6%
INTERNET SOURCES
5%
PUBLICATIONS
3%
STUDENT PAPERS
PRIMARY SOURCES
1
Submitted to Wollega University
Student Paper 1%
2
www.coursehero.com
Internet Source 1%
3
Submitted to University of Westminster
Student Paper <1 %
4
Justin T McDaniel, D L Albright, K Laha-Walsh,
H Henson, S McIntosh. "Alcohol screening
<1 %
and brief intervention among military service
members and veterans: rural–urban
disparities", BMJ Military Health, 2020
Publication
5
www.geeksforgeeks.org
Internet Source <1 %
6
www.v7labs.com
Internet Source <1 %
7 Sudi Murindanyi, Margaret Nagwovuma,
Barbara Nansamba, Ggaliwango Marvin.
<1 %
"Explainable Ensemble Learning and
Trustworthy Open AI for Customer
51