Final Project Predicting the
Stages of Chronic Kidney
Disease:
Using Machine Learning Workflow
HCIN-620-01-SP21 – Machine Learning
Mojdeh Amini
University of San Diego
Dr. Reza Afra
May 17, 2021
Outlines and Outcomes
Predicting the stages of Chronic Kidney Disease (CKD)
Glomerular Filtration Rate (GFR): Age, Weight (kg), Gender, Race
GFR (mL/min/1.73 m2) = 175 × (Scr)-1.154 × (Age)-0.203 × (0.742 if female) × (1.212 if African American)
Normal Serum Creatinine 0.7 - 1.2 milligrams per deciliter (mg/dL))
Considering: A Serum Creatinine level of greater than 1.2 for women and greater than 1.4 for men
Stages of CKD:
1- Normal? eGFR >=90
2- Mild eGFR 60 - 89
3- Moderate eGFR 30 - 59
4- Sever eGFR 15 -29
5- Failure eGFR <15
Machine learning Workflow:
1: Environment: Importing Libraries
2: Data Cleaning: Uploading and Reading Data
3: Exploratory Data Analysis: Handling Missing Data
http://archive.ics.uci.edu/ml//datasets/Chronic_Kidney_Disease
4: Build & Evaluate the Models: Identifying Outliers
Step 1: Environment
Importing Libraries
Step 2: Data Cleaning
Uploading & Reading Data
Step 3: Exploratory Data Analysis (EDA)
Preprocessing and adding CKD Stages Column
Step 3: EDA –Cont.
The Distribution of Serum Creatinine Level
Histogram: Serum Creatinine Scatterplot: GFR & Serum Creatinine
Step 3: EDA –Cont.
Isolate Features from Target
Step 3: EDA –Cont.: The Preprocessing
Transforming: Encode Variables
Encode: Changing the categorical data into numbers before fitting and evaluating models.
Step 3: EDA –Cont.: Heatmap & Feature Correlation
Step 3: EDA –Cont.
Splitting the Data to Training and Testing Sets
Splitting the splits
The train-test split: A technique for evaluating the
performance:
• Minimize the effects of data discrepancies and
better understand the characteristics of the
model.
• If train-test split has more data in the training
set will most likely give you better accuracy
• Split size: Enough data in the training dataset
for effective mapping of inputs to outputs data
https://www.kdnuggets.com/2020/05/dataset-splitting-best-practices-python.html
Step 4: Building & Evaluating the Models
Logistic Regression
For classification of the data, and it is a predictive analysis algorithm and based on the
concept of probability
Step 4 Cont.: Logistic Regression & Confusion
Matrix
Confusion Matrix: Predicted vs Labels
Confusion matrix:
• A tabular summary of the number of
correct and incorrect predictions made
by a classifier.
• To evaluate the performance of a
classification model through the
calculation of performance metrics
like accuracy, and F-score.
Step 4 Cont.: K-nearest N: K=3
KNN algorithm assumes the similarity between the new and available data
and put the new data into the category that is most like the available categories.
Step 4 Cont.: Finding the Best K
Using the error plot or accuracy plot to find the most favorable K value
References
• Krishnamurthy, S., KS, K., Dovgan, E., Luštrek, M., Gradišek Piletič, B., Srinivasan, K., ... & Syed-Abdul, S. (2021, May).
Machine Learning Prediction Models for Chronic Kidney Disease using National Health Insurance Claim Data in Taiwan. In
Healthcare (Vol. 9, No. 5, p. 546). Multidisciplinary Digital Publishing Institute.
• Medical Advisory Committee. (2021). Stages of Chronic Kidney Disease (CKD). American Kidney Fund (AKF).
https://www.kidneyfund.org/kidney-disease/chronic-kidney-disease-ckd/stages-of-chronic-kidney-disease/
• Raynaud, M., Aubert, O., Reese, P. P., Bouatou, Y., Naesens, M., Kamar, N., ... & Loupy, A. (2021). Trajectories of
glomerular filtration rate and progression to end stage kidney disease after kidney transplantation. Kidney international,
99(1), 186-197.
• Shlipak, M. G., Tummalapalli, S. L., Boulware, L. E., Grams, M. E., Ix, J. H., Jha, V., ... & Zomer, E. (2021). The case for
early identification and intervention of chronic kidney disease: conclusions from a Kidney Disease: Improving Global
Outcomes (KDIGO) Controversies Conference. Kidney international, 99(1), 34-47.
• Thongprayoon, C., Kaewput, W., Choudhury, A., Hansrivijit, P., Mao, M. A., & Cheungpasitporn, W. (2021). Is It Time for
Machine Learning Algorithms to Predict the Risk of Kidney Failure in Patients with Chronic Kidney Disease?.
Q&A
Thank You!
mamini@sandiego.edu