KEMBAR78
Data Mining Theory and Python Project.pptx
Restaurant Data Analysis
Course Name: Software Development Project – II
Course Code : ICT-3112
3/3/2024 Presented by: Sadika, Noor & Rakib 2
Team members
No Name ID
01 Sadika Khatun Jhinu IT20029
02 Gazi Md. Noor Hossain IT20030
03 Rakibul Islam IT20031
Supervisor
Md. Tanvir Rahman
Assistant Professor
Dept. of ICT
MBSTU
 Dataset
 Data Mining
 Python Programming Language
 Binary & Discrete Classification
 Euclidean Distance
 Minkowski Distance
 Regression Analysis
 Linear Regression
 Covariance
 Deviation
 Prediction Using SVM
 ROC Curve
3/3/2024 Presented by: Sadika, Noor & Rakib 3
Contents
Our aim is to
 Collect a Dataset from Kaggle
 Implement the knowledge that we learnt in Data Mining
...Course
 Implement using Python Programming Language
3/3/2024 Presented by: Sadika, Noor & Rakib 4
Project Proposal
Dataset
 We collect this restaurant dataset from Kaggle. Kaggle is a
popular online platform for data science competitions, machine
learning challenges, and data sets which is founded in 2010.
 It contains customer details, their personal ratings and their
payment system.
 It’s a numerical dataset.
 It contains 2000 data for analysis.
 The dataset file is in .csv (Comma Separated Value) format which
allows data to be saved in a tabular format.
• The attributes of this file:
3/3/2024 Presented by: Sadika, Noor & Rakib 5
1. CustomerID
2. Height
3. Weight
4. Age
5. annual_income
6. ratings
7. Price
8. Payment
Data Mining
Data mining is a process of extracting meaningful patterns, trends, and insights
from large volumes of data. It involves the use of advanced algorithms and
statistical techniques to discover hidden relationships within datasets.
Key features of Data Mining:
 Classification and Clustering: Data mining allows for the categorization
of data into distinct groups through classification. Clustering involves
grouping similar data points together without predefined categories.
 Anomaly Detection: It can identify unusual or anomalous data points.
This feature is valuable for fraud detection, outlier identification, and
quality control.
 Regression Analysis: This involves the estimation of relationships
between variables.
 Association Rule Mining: It identifies relationships between different
items in a datasets.
 Predictive Modeling: Data mining enables the creation of predictive
models that can forecast future trends or outcomes based on historical data.
3/3/2024 Presented by: Sadika, Noor & Rakib 6
Python Programming Language
 Python is a high-level, versatile, and dynamically-typed programming language
known for its simplicity, readability, and extensive standard library.
 Python programming language is being used in web development, Machine Learning
applications, along with all cutting-edge technology in Software Industry.
 Python's simplicity, readability, extensive libraries, and versatility have made it a
favored language across a wide range of industries and applications, from web
development to scientific research and artificial intelligence.
Applications of Python
Python can be used on a server to create web applications
Python can be used alongside software to create workflows.
Python can connect to database systems. It can also read and modify files.
Python can connect to database systems and can also read and modify files.
Python can be used to handle big data and perform complex mathematics.
3/3/2024 Presented by: Sadika, Noor & Rakib 7
Classification
Classification is a process of categorizing data or objects into predefined classes
or categories based on their features or attributes. In machine learning,
classification is a type of supervised learning technique where an algorithm is
trained on a labeled dataset to predict the class or category of new, unseen data.
Classification is of two types:
1. Binary Classification: In binary classification, the goal is to classify the input into one of two
classes or categories.
2. Multiclass Classification: In multi-class classification, the goal is to classify the input into one
of several classes or categories.
3/3/2024 Presented by: Sadika, Noor & Rakib 8
Binarization
• A simple technique to binarize a categorical attribute is the following: If
there are m categorical values, then uniquely assign each original value
to an integer in the interval (0, m-1)
• Here, if we split (Weight) from data set by applying some condition then
the code is:
condition1 = data['weight'] < 30
condition2 = (data['weight']>=30)&(data['weight’]<=60)
condition3 = data['weight'] > 60
data['Below_30'] = condition1.astype(int)
data['Between_30_and_60'] = condition2.astype(int)
data['Above_60'] = condition3.astype(int)
print(data)
3/3/2024 Presented by: Sadika, Noor & Rakib 9
Binarization
3/3/2024 Presented by: Sadika, Noor & Rakib 10
Discretization
Discretization is typically applied to attributes that are used in classification or
association analysis. Transformation of a continuous attribute to a categorical attribute
involves two subtasks: deciding how many categories, n, to have and determining how
to map the values of the continuous attribute to these categories.
Here for, threshold = 3 We can split our (Weight) dataset into 3 specific categories.
num_bins = 3
bin_labels = ['Less', 'Medium', 'More']
data['New Weight'] = pd.cut(data['weight'],
bins=num_bins, labels=bin_labels)
print(data)
3/3/2024 Presented by: Sadika, Noor & Rakib 11
Discretization
3/3/2024 Presented by: Sadika, Noor & Rakib 12
Euclidean Distance
The Euclidean distance is a measure of the straight-line distance between two
points in Euclidean space. It is the most commonly used distance metric in
geometry and machine learning.
Properties:
1. It is always non-negative (d≥0).
2. It is symmetric, meaning the distance from point A to point B is the same as from point B
to point A.
3. It satisfies the triangle inequality, which means the shortest distance between two points
is a straight line.
Euclidean distance, d = 𝑖=1
𝑛
(𝑥𝑖 − 𝑦𝑖 )2
point1 = data['weight']
point2 = data['height']
distance = np.linalg.norm(point1 - point2)
Euclidean distance: 2698.051
3/3/2024 Presented by: Sadika, Noor & Rakib 13
Minkowski Distance
The Minkowski distance is a metric used to measure the distance between two points in
a multidimensional space. It is a generalization of other distance metrics like Euclidean
distance and Manhattan distance.
Minkowski Distance, d = 𝑖=1
𝑛
|𝑥𝑖 − 𝑦𝑖 |𝑝
1
𝑝
Some properties of the Minkowski distance:
1. When p=1, it is called the Manhattan distance or L1 norm.
2. When p=2, it is called the Euclidean distance or L2 norm.
3. If p approaches infinity, the Minkowski distance approaches the Chebyshev
distance
point1 = data['weight']
point2 = data['height’]
p = 2
Distance = np.power(np.sum(np.abs(point1 - point2) ** p), 1/p)
Minkowski distance (p=2): 2698.0517
3/3/2024 Presented by: Sadika, Noor & Rakib 14
Regression Analysis
Regression analysis is a statistical method that shows the relationship between
two or more variables.
 Usually expressed in a graph, the method tests the relationship between a
dependent variable against independent variables.
 Typically, the independent variable(s) changes with the dependent variable(s)
and the regression analysis attempts to answer which factors matter most to
that change.
 Generally, regression analysis is used to:
 Try and explain a phenomenon
 Predict future events
 Optimize manufacturing and delivery processes
 Resolve errors
 Provide new insights
3/3/2024 Presented by: Sadika, Noor & Rakib 15
Linear Regression
• Linear regression is a type of supervised machine learning algorithm that
computes the linear relationship between a dependent variable and one or
more independent features.
• The equation for Linear Regression, y = ax + b
Here, x is independent variable
y is dependent variable
a = intercept point of regression line
b = slop of regression line
Again,
b =
(𝑥𝑦) −
𝑥. 𝑦
𝑛
𝑥2 −
( 𝑥)
2
𝑛
and, a = 𝑦 − 𝑏. 𝑥
3/3/2024 Presented by: Sadika, Noor & Rakib 16
Linear Regression
• model = LinearRegression()
• model.fit(X, Y)
• slope = model.coef_[0]
• intercept = model.intercept_
Slope (Coefficient): 2.889
Intercept: -68.252
3/3/2024 Presented by: Sadika, Noor & Rakib 17
Covariance
Covariance is a measure of the relationship between two random variables
and to what extent, they change together. It defines the changes between
the two variables, such that change in one variable is equal to change in
another variable.
X = data['weight']
Y = data['height’]
mean_X = np.mean(X)
mean_Y = np.mean(Y)
covariance = np.sum((X - mean_X) * (Y - mean_Y)) / (len(X) - 1)
Covariance of Height and Weight: 11.17
3/3/2024 Presented by: Sadika, Noor & Rakib 18
Sample covariance Formula:
Cov(x,y) =
Standard Deviation
Standard deviation is a statistical measure that quantifies the amount
of variation or dispersion in a set of data points. It provides a way to
understand how spread out the values in a dataset are around the
mean.
Standard Deviation, σ = 𝑖=1
𝑛 𝑥𝑖−𝑥 2
2
X = data['height’]
Y = data[‘Weight’]
mean_X = np.mean(X)
std_dev_X = np.sqrt(np.mean((X - mean_X)**2))
Standard Deviation of Height: 1.97
Standard Deviation of Weight: 11.50
3/3/2024 Presented by: Sadika, Noor & Rakib 19
Prediction Algorithm
Prediction refers to the process of estimating or forecasting future events,
outcomes, or values based on existing data and patterns.
Key points:
 Methodology: Predictions are made using various techniques and models. These
may include statistical methods, machine learning algorithms, regression analysis,
time series analysis, and more.
 Training Data: To make accurate predictions, models are typically trained on
historical or existing data so that we can make a relationship or pattern with new or
unseen data.
 Accuracy and Performance: The accuracy of predictions is a critical metric.
Models are evaluated based on how well they can generalize to new data.
 Applications: Prediction is widely used across various domains. For instance, in
finance, predictions are made about stock prices; in healthcare, predictions are made
about disease progression; in weather forecasting, predictions are made about future
weather conditions.
3/3/2024 Presented by: Sadika, Noor & Rakib 20
Support Vector Machine (SVM)
 Support Vector Machine (SVM) is a powerful machine learning algorithm
used for linear or nonlinear classification, regression, and even outlier
detection tasks.
 SVMs can be used for a variety of tasks, such as text classification, image
classification, spam detection, handwriting identification, gene expression
analysis, face detection, and anomaly detection.
3/3/2024 Presented by: Sadika, Noor & Rakib 21
Support Vector Machine (SVM)
 Drop function: In Python, the drop function is a built-in function in the
standard library. It is used for removing columns.
3/3/2024 Presented by: Sadika, Noor & Rakib 22
CustomerID height weight age annual_income rate price payment
1 65 112 19 15000 3.4 1325 cash
2 71 136 21 35000 3.9 1600 cash
3 69 153 20 86000 3.7 1850 VISA
4 68 142 23 59000 2.7 2075 VISA
5 67 144 31 38000 2.8 1600 VISA
6 68 123 22 58000 3.4 2075 VISA
7 69 141 35 31000 4.1 1650 VISA
8 70 136 23 84000 2.8 2075 VISA
9 67 112 64 97000 3.2 1650 cash
Support Vector Machine (SVM)
 Level Encoder: Label Encoding is a technique that is used to convert
categorical columns into numerical ones so that they can be fitted by machine
learning models which only take numerical data. It is an important pre-
processing step in a machine-learning project.
 fillna: fillna is a method used in Python for filling missing values in a pandas
DataFrame or Series. It's a common operation when working with data, as
missing values can cause issues when performing calculations or visualizing
data.
 mean: mean refers to the average of a set of numbers.
mean = sum(numbers) / len(numbers)
mean = np.mean(numbers)
3/3/2024 Presented by: Sadika, Noor & Rakib 23
Support Vector Machine (SVM)
Test Data & Training data:
In machine learning and statistical modeling, datasets are typically divided into
two main subsets: training data and test data. These subsets serve distinct
purposes in developing and evaluating predictive models:
• Training Data: The training data is used to train or build the predictive model.
Moreover it is used for teaching the model how to make predictions or
classifications.
• Test Data: The test data is used to evaluate the model's performance and
assess how well it generalizes to new, unseen data.
Here, 20% of Data is used for Test purpose.
And also used random state = 57
3/3/2024 Presented by: Sadika, Noor & Rakib 24
Support Vector Machine (SVM)
 Accuracy: This is the ratio of correctly predicted instances (both true positives and
true negatives) to the total instances in the dataset.
Accuracy : 0.615
 Precision: Also known as Positive Predictive Value, it is the ratio of true positives to
the sum of true positives and false positives. It measures the accuracy of the positive
predictions.
Precision : 1.0
 Recall: Also known as Sensitivity, Hit Rate, or True Positive Rate, it is the ratio of
true positives to the sum of true positives and false negatives. It measures the
sensitivity to detect the positive class.
Recall : 0.615
F1-measure: The harmonic mean of precision and recall. It provides a balance
between precision and recall and is particularly useful when dealing with imbalanced
datasets.
F1-measure : 0.761
3/3/2024 Presented by: Sadika, Noor & Rakib 25
ROC Curve
The Receiver Operating Characteristic (ROC) curve is a graphical representation
that illustrates the diagnostic ability of a binary classification model. It plots the
True Positive Rate against the False Positive Rate for different classification
thresholds.
3/3/2024 Presented by: Sadika, Noor & Rakib 26
3/3/2024 Presented by: Sadika, Noor & Rakib 27

Data Mining Theory and Python Project.pptx

  • 1.
  • 2.
    Course Name: SoftwareDevelopment Project – II Course Code : ICT-3112 3/3/2024 Presented by: Sadika, Noor & Rakib 2 Team members No Name ID 01 Sadika Khatun Jhinu IT20029 02 Gazi Md. Noor Hossain IT20030 03 Rakibul Islam IT20031 Supervisor Md. Tanvir Rahman Assistant Professor Dept. of ICT MBSTU
  • 3.
     Dataset  DataMining  Python Programming Language  Binary & Discrete Classification  Euclidean Distance  Minkowski Distance  Regression Analysis  Linear Regression  Covariance  Deviation  Prediction Using SVM  ROC Curve 3/3/2024 Presented by: Sadika, Noor & Rakib 3 Contents
  • 4.
    Our aim isto  Collect a Dataset from Kaggle  Implement the knowledge that we learnt in Data Mining ...Course  Implement using Python Programming Language 3/3/2024 Presented by: Sadika, Noor & Rakib 4 Project Proposal
  • 5.
    Dataset  We collectthis restaurant dataset from Kaggle. Kaggle is a popular online platform for data science competitions, machine learning challenges, and data sets which is founded in 2010.  It contains customer details, their personal ratings and their payment system.  It’s a numerical dataset.  It contains 2000 data for analysis.  The dataset file is in .csv (Comma Separated Value) format which allows data to be saved in a tabular format. • The attributes of this file: 3/3/2024 Presented by: Sadika, Noor & Rakib 5 1. CustomerID 2. Height 3. Weight 4. Age 5. annual_income 6. ratings 7. Price 8. Payment
  • 6.
    Data Mining Data miningis a process of extracting meaningful patterns, trends, and insights from large volumes of data. It involves the use of advanced algorithms and statistical techniques to discover hidden relationships within datasets. Key features of Data Mining:  Classification and Clustering: Data mining allows for the categorization of data into distinct groups through classification. Clustering involves grouping similar data points together without predefined categories.  Anomaly Detection: It can identify unusual or anomalous data points. This feature is valuable for fraud detection, outlier identification, and quality control.  Regression Analysis: This involves the estimation of relationships between variables.  Association Rule Mining: It identifies relationships between different items in a datasets.  Predictive Modeling: Data mining enables the creation of predictive models that can forecast future trends or outcomes based on historical data. 3/3/2024 Presented by: Sadika, Noor & Rakib 6
  • 7.
    Python Programming Language Python is a high-level, versatile, and dynamically-typed programming language known for its simplicity, readability, and extensive standard library.  Python programming language is being used in web development, Machine Learning applications, along with all cutting-edge technology in Software Industry.  Python's simplicity, readability, extensive libraries, and versatility have made it a favored language across a wide range of industries and applications, from web development to scientific research and artificial intelligence. Applications of Python Python can be used on a server to create web applications Python can be used alongside software to create workflows. Python can connect to database systems. It can also read and modify files. Python can connect to database systems and can also read and modify files. Python can be used to handle big data and perform complex mathematics. 3/3/2024 Presented by: Sadika, Noor & Rakib 7
  • 8.
    Classification Classification is aprocess of categorizing data or objects into predefined classes or categories based on their features or attributes. In machine learning, classification is a type of supervised learning technique where an algorithm is trained on a labeled dataset to predict the class or category of new, unseen data. Classification is of two types: 1. Binary Classification: In binary classification, the goal is to classify the input into one of two classes or categories. 2. Multiclass Classification: In multi-class classification, the goal is to classify the input into one of several classes or categories. 3/3/2024 Presented by: Sadika, Noor & Rakib 8
  • 9.
    Binarization • A simpletechnique to binarize a categorical attribute is the following: If there are m categorical values, then uniquely assign each original value to an integer in the interval (0, m-1) • Here, if we split (Weight) from data set by applying some condition then the code is: condition1 = data['weight'] < 30 condition2 = (data['weight']>=30)&(data['weight’]<=60) condition3 = data['weight'] > 60 data['Below_30'] = condition1.astype(int) data['Between_30_and_60'] = condition2.astype(int) data['Above_60'] = condition3.astype(int) print(data) 3/3/2024 Presented by: Sadika, Noor & Rakib 9
  • 10.
    Binarization 3/3/2024 Presented by:Sadika, Noor & Rakib 10
  • 11.
    Discretization Discretization is typicallyapplied to attributes that are used in classification or association analysis. Transformation of a continuous attribute to a categorical attribute involves two subtasks: deciding how many categories, n, to have and determining how to map the values of the continuous attribute to these categories. Here for, threshold = 3 We can split our (Weight) dataset into 3 specific categories. num_bins = 3 bin_labels = ['Less', 'Medium', 'More'] data['New Weight'] = pd.cut(data['weight'], bins=num_bins, labels=bin_labels) print(data) 3/3/2024 Presented by: Sadika, Noor & Rakib 11
  • 12.
  • 13.
    Euclidean Distance The Euclideandistance is a measure of the straight-line distance between two points in Euclidean space. It is the most commonly used distance metric in geometry and machine learning. Properties: 1. It is always non-negative (d≥0). 2. It is symmetric, meaning the distance from point A to point B is the same as from point B to point A. 3. It satisfies the triangle inequality, which means the shortest distance between two points is a straight line. Euclidean distance, d = 𝑖=1 𝑛 (𝑥𝑖 − 𝑦𝑖 )2 point1 = data['weight'] point2 = data['height'] distance = np.linalg.norm(point1 - point2) Euclidean distance: 2698.051 3/3/2024 Presented by: Sadika, Noor & Rakib 13
  • 14.
    Minkowski Distance The Minkowskidistance is a metric used to measure the distance between two points in a multidimensional space. It is a generalization of other distance metrics like Euclidean distance and Manhattan distance. Minkowski Distance, d = 𝑖=1 𝑛 |𝑥𝑖 − 𝑦𝑖 |𝑝 1 𝑝 Some properties of the Minkowski distance: 1. When p=1, it is called the Manhattan distance or L1 norm. 2. When p=2, it is called the Euclidean distance or L2 norm. 3. If p approaches infinity, the Minkowski distance approaches the Chebyshev distance point1 = data['weight'] point2 = data['height’] p = 2 Distance = np.power(np.sum(np.abs(point1 - point2) ** p), 1/p) Minkowski distance (p=2): 2698.0517 3/3/2024 Presented by: Sadika, Noor & Rakib 14
  • 15.
    Regression Analysis Regression analysisis a statistical method that shows the relationship between two or more variables.  Usually expressed in a graph, the method tests the relationship between a dependent variable against independent variables.  Typically, the independent variable(s) changes with the dependent variable(s) and the regression analysis attempts to answer which factors matter most to that change.  Generally, regression analysis is used to:  Try and explain a phenomenon  Predict future events  Optimize manufacturing and delivery processes  Resolve errors  Provide new insights 3/3/2024 Presented by: Sadika, Noor & Rakib 15
  • 16.
    Linear Regression • Linearregression is a type of supervised machine learning algorithm that computes the linear relationship between a dependent variable and one or more independent features. • The equation for Linear Regression, y = ax + b Here, x is independent variable y is dependent variable a = intercept point of regression line b = slop of regression line Again, b = (𝑥𝑦) − 𝑥. 𝑦 𝑛 𝑥2 − ( 𝑥) 2 𝑛 and, a = 𝑦 − 𝑏. 𝑥 3/3/2024 Presented by: Sadika, Noor & Rakib 16
  • 17.
    Linear Regression • model= LinearRegression() • model.fit(X, Y) • slope = model.coef_[0] • intercept = model.intercept_ Slope (Coefficient): 2.889 Intercept: -68.252 3/3/2024 Presented by: Sadika, Noor & Rakib 17
  • 18.
    Covariance Covariance is ameasure of the relationship between two random variables and to what extent, they change together. It defines the changes between the two variables, such that change in one variable is equal to change in another variable. X = data['weight'] Y = data['height’] mean_X = np.mean(X) mean_Y = np.mean(Y) covariance = np.sum((X - mean_X) * (Y - mean_Y)) / (len(X) - 1) Covariance of Height and Weight: 11.17 3/3/2024 Presented by: Sadika, Noor & Rakib 18 Sample covariance Formula: Cov(x,y) =
  • 19.
    Standard Deviation Standard deviationis a statistical measure that quantifies the amount of variation or dispersion in a set of data points. It provides a way to understand how spread out the values in a dataset are around the mean. Standard Deviation, σ = 𝑖=1 𝑛 𝑥𝑖−𝑥 2 2 X = data['height’] Y = data[‘Weight’] mean_X = np.mean(X) std_dev_X = np.sqrt(np.mean((X - mean_X)**2)) Standard Deviation of Height: 1.97 Standard Deviation of Weight: 11.50 3/3/2024 Presented by: Sadika, Noor & Rakib 19
  • 20.
    Prediction Algorithm Prediction refersto the process of estimating or forecasting future events, outcomes, or values based on existing data and patterns. Key points:  Methodology: Predictions are made using various techniques and models. These may include statistical methods, machine learning algorithms, regression analysis, time series analysis, and more.  Training Data: To make accurate predictions, models are typically trained on historical or existing data so that we can make a relationship or pattern with new or unseen data.  Accuracy and Performance: The accuracy of predictions is a critical metric. Models are evaluated based on how well they can generalize to new data.  Applications: Prediction is widely used across various domains. For instance, in finance, predictions are made about stock prices; in healthcare, predictions are made about disease progression; in weather forecasting, predictions are made about future weather conditions. 3/3/2024 Presented by: Sadika, Noor & Rakib 20
  • 21.
    Support Vector Machine(SVM)  Support Vector Machine (SVM) is a powerful machine learning algorithm used for linear or nonlinear classification, regression, and even outlier detection tasks.  SVMs can be used for a variety of tasks, such as text classification, image classification, spam detection, handwriting identification, gene expression analysis, face detection, and anomaly detection. 3/3/2024 Presented by: Sadika, Noor & Rakib 21
  • 22.
    Support Vector Machine(SVM)  Drop function: In Python, the drop function is a built-in function in the standard library. It is used for removing columns. 3/3/2024 Presented by: Sadika, Noor & Rakib 22 CustomerID height weight age annual_income rate price payment 1 65 112 19 15000 3.4 1325 cash 2 71 136 21 35000 3.9 1600 cash 3 69 153 20 86000 3.7 1850 VISA 4 68 142 23 59000 2.7 2075 VISA 5 67 144 31 38000 2.8 1600 VISA 6 68 123 22 58000 3.4 2075 VISA 7 69 141 35 31000 4.1 1650 VISA 8 70 136 23 84000 2.8 2075 VISA 9 67 112 64 97000 3.2 1650 cash
  • 23.
    Support Vector Machine(SVM)  Level Encoder: Label Encoding is a technique that is used to convert categorical columns into numerical ones so that they can be fitted by machine learning models which only take numerical data. It is an important pre- processing step in a machine-learning project.  fillna: fillna is a method used in Python for filling missing values in a pandas DataFrame or Series. It's a common operation when working with data, as missing values can cause issues when performing calculations or visualizing data.  mean: mean refers to the average of a set of numbers. mean = sum(numbers) / len(numbers) mean = np.mean(numbers) 3/3/2024 Presented by: Sadika, Noor & Rakib 23
  • 24.
    Support Vector Machine(SVM) Test Data & Training data: In machine learning and statistical modeling, datasets are typically divided into two main subsets: training data and test data. These subsets serve distinct purposes in developing and evaluating predictive models: • Training Data: The training data is used to train or build the predictive model. Moreover it is used for teaching the model how to make predictions or classifications. • Test Data: The test data is used to evaluate the model's performance and assess how well it generalizes to new, unseen data. Here, 20% of Data is used for Test purpose. And also used random state = 57 3/3/2024 Presented by: Sadika, Noor & Rakib 24
  • 25.
    Support Vector Machine(SVM)  Accuracy: This is the ratio of correctly predicted instances (both true positives and true negatives) to the total instances in the dataset. Accuracy : 0.615  Precision: Also known as Positive Predictive Value, it is the ratio of true positives to the sum of true positives and false positives. It measures the accuracy of the positive predictions. Precision : 1.0  Recall: Also known as Sensitivity, Hit Rate, or True Positive Rate, it is the ratio of true positives to the sum of true positives and false negatives. It measures the sensitivity to detect the positive class. Recall : 0.615 F1-measure: The harmonic mean of precision and recall. It provides a balance between precision and recall and is particularly useful when dealing with imbalanced datasets. F1-measure : 0.761 3/3/2024 Presented by: Sadika, Noor & Rakib 25
  • 26.
    ROC Curve The ReceiverOperating Characteristic (ROC) curve is a graphical representation that illustrates the diagnostic ability of a binary classification model. It plots the True Positive Rate against the False Positive Rate for different classification thresholds. 3/3/2024 Presented by: Sadika, Noor & Rakib 26
  • 27.
    3/3/2024 Presented by:Sadika, Noor & Rakib 27