KEMBAR78
Disease Detection Machine Learning Model | PDF | Machine Learning | Information Science
0% found this document useful (0 votes)
4 views10 pages

Disease Detection Machine Learning Model

The document outlines the development of machine learning models to predict disease outcomes using patient symptoms and profiles, aiming for early and personalized healthcare. It details the methods of data preparation, feature engineering, model selection, and evaluation metrics, highlighting improvements in accuracy after feature engineering. The conclusion emphasizes the effectiveness of the Random Forest model and the potential for these models to aid in real-world diagnostic decisions.

Uploaded by

Atia Batool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Disease Detection Machine Learning Model

The document outlines the development of machine learning models to predict disease outcomes using patient symptoms and profiles, aiming for early and personalized healthcare. It details the methods of data preparation, feature engineering, model selection, and evaluation metrics, highlighting improvements in accuracy after feature engineering. The conclusion emphasizes the effectiveness of the Random Forest model and the potential for these models to aid in real-world diagnostic decisions.

Uploaded by

Atia Batool
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Disease Detection Machine

Learning Model

•Nouman Nazir — F2023393005


•Komal Shehzadi — F2023436009
Problem Statement and
Background
Problem Statement
• Develop and evaluate ML models to predict disease outcomes
• Leverage patient symptoms and profile data for accurate diagnosis
• Enable early and personalized healthcare
Background
• Healthcare data explosion enables ML-powered diagnostics
• Aim to automate disease prediction to support doctors
• Dataset sourced from Kaggle: Comprehensive Disease Symptom and Patient
Profile
• Focus: Relationship between patient traits and disease patterns
Methods – Data Preparation
Dataset
• Disease_symptom_and_patient_profile_dataset.csv (Kaggle)
Initial Renaming
• Difficulty Breathing → DB
• Blood Pressure → BP
• Cholesterol Level → CL
• Outcome → Results
Preprocessing Steps
• Label Encoding: Convert categorical features (Yes/No → 1/0)
• Train-Test Split: 80% training, 20% testing
Feature Engineering & Scaling
Feature Engineering
• Combined Symptoms:
• Fever_and_Cough, Fever_and_Fatigue, etc.
• Age Grouping: Child, Adult, Elderly
• Derived Features:
• Risk Score = f(Age, CL)
• Age Squared for non-linear modeling
• Disease Frequency counts
Encoding & Scaling
• One-Hot Encoding: For Age_Group
• Min-Max Scaling: Age, Risk_Score, Disease_Frequency, Age_Squared
Model Selection & Training
Models Used
• Logistic Regression (LR)
• K-Nearest Neighbors (KNN)
• Decision Tree Classifier (CART)
• Random Forest (RF)
Training Phases
• Phase 1: Basic label-encoded data
• Phase 2: With feature-engineered data
Tools and Techniques
Language & Environment
• Python with Jupyter / Google Colab
Libraries
• pandas, numpy, seaborn, matplotlib
• scikit-learn (sklearn):
• Preprocessing: LabelEncoder, MinMaxScaler
• Model: train_test_split, accuracy_score, classification_report
Techniques
• Supervised Learning
• Ensemble Learning (Random Forest)
• Feature Importance metrics
Evaluation Metrics
Metrics Used
• Accuracy Score
• Precision, Recall, F1-Score
• Support
• Confusion Matrix (TP, TN, FP, FN breakdown)
Performance Comparison
Before Feature Engineering
• Initial accuracy range: e.g., 65%–75%
• Classification metrics showed limited precision
After Feature Engineering
• Improved accuracy: e.g., 80%–90%
• Random Forest saw highest boost
• Better F1-scores and recall values
Feature Importance &
Conclusion
Key Features Identified
• Age, Risk_Score, Fever, Fatigue, DB
• Visualization via bar plots (Decision Tree & RF)
Conclusion
• Feature engineering improves ML performance
• Random Forest shows robustness
• Models can support real-world diagnostic decision-making
• Thank You

You might also like