KEMBAR78
Chapter 4 Deep Learning & CNN | PDF | Receiver Operating Characteristic | Deep Learning
0% found this document useful (0 votes)
35 views54 pages

Chapter 4 Deep Learning & CNN

Chapters 4 and 5 cover deep learning, specifically focusing on Convolutional Neural Networks (CNNs), their architectures, and applications. Key topics include the relationship between deep learning, AI, and ML, types of neural networks, and performance evaluation metrics. The document also discusses the advantages of deep learning over traditional machine learning techniques, emphasizing its effectiveness in handling unstructured data and complex pattern recognition.

Uploaded by

mersha abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views54 pages

Chapter 4 Deep Learning & CNN

Chapters 4 and 5 cover deep learning, specifically focusing on Convolutional Neural Networks (CNNs), their architectures, and applications. Key topics include the relationship between deep learning, AI, and ML, types of neural networks, and performance evaluation metrics. The document also discusses the advantages of deep learning over traditional machine learning techniques, emphasizing its effectiveness in handling unstructured data and complex pattern recognition.

Uploaded by

mersha abdisa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Chapter-4 & Chapter 5

Deep Learning and CNN


 Deep learning (Introduction, relations with AI/ML, Applications, Driving factors)
 Introduction to Deep Convolutional Neural Network Algorithms
 Layers (Convolution, Pooling, FCL)
 Activation functions (ReLU, SoftMax)
 Building blocks of CNN(Kernel/filter, Stride, Padding, parameters…..
 Functions and challenges in specific fields
 Performance Evaluation Metrics
 Accuracy, Precision, Recall, F1-score
 Confusion Matrix, ROC Curve and AUC (Area Under Curve)
Introduction
• Deep learning is a specialized subset of machine learning, which itself is a
subset of artificial intelligence.
• Deep learning utilizes artificial neural networks with multiple layers to
analyze data, often requiring large datasets and significant computational
resources.
• Machine learning, more broadly, encompasses various algorithms that
enable systems to learn from data and make predictions or decisions
without being explicitly programmed.
o Artificial Intelligence (AI): The main concept of machines exhibiting human-like
intelligence.
o Machine Learning (ML): A field within AI that focuses on developing algorithms
that allow systems to learn from data and improve their performance over time
without explicit programming.
o Deep Learning (DL): A more advanced subfield of machine learning that uses
artificial neural networks with multiple layers (hence "deep") to analyze
complex data. What makes deep learning "deep" is the depth of the neural
network architecture, specifically the number of layers between the input and
the output.
Introduction
•What is Machine Learning?

•Machine learning is a subfield of artificial intelligence that focuses on the


development of algorithms and statistical models that enable computers to learn and
make predictions or decisions without being explicitly programmed.
• It involves training algorithms on large datasets to identify patterns and
relationships and then using these patterns to make predictions or decisions about
new data.
• Types of Machine Learning
• Machine learning is further divided into categories based on the data on which we
are training our model.
o Supervised Learning - This method is used when we have Training data along
with the labels for the correct answer.
o Unsupervised Learning - In this task our main objective is to find the patterns or
groups in the dataset at hand because we don't have any particular labels in
this dataset.
o Reinforcement learning (RL) is a machine learning paradigm where an agent
learns to make decisions in an environment to maximize a reward signal,
essentially learning through trial and error.
Introduction
• What is Deep Learning?
• Deep learning is a subset of machine learning that uses neural networks with
multiple layers to analyze complex patterns and relationships in data. It is inspired
by the structure and function of the human brain and has been successful in a
variety of tasks, such as computer vision, natural language processing, and speech
recognition.
• Deep learning models are trained using large amounts of data and algorithms that
are able to learn and improve over time, becoming more accurate as they process
more data. This makes them well-suited to complex, real-world problems and
enables them to learn and adapt to new situations.
• Is Deep learning supervised or unsupervised learning or RL?
o Deep learning can utilize both supervised and unsupervised learning
techniques. While it often employs supervised learning with labeled data for
tasks like image classification, it also excels at unsupervised learning, such as
clustering or feature extraction from unlabeled data.
o Therefore, deep learning is not inherently tied to either supervised or
unsupervised learning; it can leverage both approaches depending on the
specific task and available data. Deep reinforcement learning combines deep
learning with reinforcement learning techniques.
Introduction
How many hidden layers for
deep learning?
How many layers are in deep
learning?
Types of Deep Learning
• Deep learning encompasses various architectures; each suited to
different types of tasks:
o Convolutional Neural Networks (CNNs): Primarily used for image processing
tasks, CNNs are designed to automatically and adaptively learn spatial
hierarchies of features through convolutional layers.
o Recurrent Neural Networks (RNNs): Ideal for sequential data, such as time
series or natural language, RNNs have loops that allow information to persist,
making them effective for tasks like speech recognition and language modeling.
o Long Short-Term Memory Networks (LSTMs): A type of RNN that addresses the
vanishing gradient problem, LSTMs are used for complex sequences, including
text and speech.
o Generative Adversarial Networks (GANs): These consist of two neural networks
(generator and discriminator) that compete against each other, leading to the
creation of high-quality synthetic data, such as images.
o Transformers: A more recent architecture designed for handling long-range
dependencies in data, transformers are the backbone of models like GPT and
BERT, used extensively in natural language processing.

6
Types of Deep Learning
• Deep learning encompasses various architectures; each suited to
different types of tasks:
o Transformers:-GPT and BERT are both powerful language models, but they
differ in their architecture and intended use cases. BERT (Bidirectional Encoder
Representations from Transformers) is designed for understanding language
context and excels in tasks like question answering and text classification. GPT
(Generative Pre-trained Transformer) is geared towards generating human-like
text and is well-suited for tasks like text completion and content creation.
o Deep Belief Networks (DBNs):- DBNs are a type of deep neural network that
are typically constructed by stacking Restricted Boltzmann Machines (RBMs).
They are known for their ability to learn hierarchical feature representations
from data. DBNs are often used for unsupervised pre-training of other deep
learning models or for tasks like dimensionality reduction, feature learning, and
classification.
o Stacked Autoencoders: Autoencoders are neural networks that are trained to
reconstruct their input, often with a bottleneck layer in the middle that forces
the network to learn a compressed representation of the data. Stacked
autoencoders involve stacking multiple autoencoders, each learning a layer of
features. Like DBNs, stacked autoencoders are also used for feature learning
and dimensionality reduction.
7
Deep Learning
Deep Learning(DL)
What Problems Can Deep Learning Solve? Why we choose
deep learning over the other ML techniques such as SVM,
DT, Regression…?
• The power of deep learning models comes from their ability to
classify or predict nonlinear data using a modest number of
parallel nonlinear steps.
• A deep learning model learns the input data features hierarchy all
the way from raw data input to the actual classification of the
data.
• Each layer extracts features from the output of the previous layer.
• They are good at identifying complex patterns in data
o improve things like computer vision and natural language processing, and
to solve unstructured data challenges
9
What Problems Can Deep Learning Solve?
• Use Machine Learning for structured data, smaller datasets, and when
interpretability is important.
• Use Deep Learning for unstructured data, large-scale datasets, and complex
pattern recognition. Why Choose Deep Learning?
Reason Explanation
Best suited for images, audio, video,
Handles Unstructured Data
and natural language (text).
Learns features from data directly—
Automatic Feature Extraction
no need for manual selection.
Often achieves state-of-the-art
High Accuracy performance in complex tasks (e.g.,
object detection, translation).
Performs better as data size
Scalability
increases.
Essential in AI fields like
Modern Applications autonomous driving, chatbots, facial
recognition, etc.

10
• Why Choose Traditional Machine Learning?
Reason Explanation
Effective when data is limited (hundreds
Works Well on Small Datasets
to thousands of samples).
Less computationally expensive than
Faster Training & Simpler Models
deep learning.
Algorithms like decision trees or logistic
More Interpretable
regression are easier to explain.
Performs well on tabular data like
Structured Data
spreadsheets or database tables.
Easier to implement and tune for quick
Quick Prototyping
insights.

Why is deep learning important? Deep learning technology drives many artificial
intelligence applications used in everyday products, such as the following:
• Chatbots and code generators
• Digital assistants
• Voice-activated television remotes
• Fraud detection
• Automatic facial recognition. Businesses use deep learning models to analyze data and make
predictions in various applications.
Who Uses Deep Learning?
• Deep learning is used by a wide range of individuals and
organizations across various industries. Professionals like data
scientists, data engineers, and software developers utilize it for tasks
like automation, improving efficiency, and developing intelligent
systems. Industries that extensively use deep learning include
healthcare (for diagnostics and drug discovery), entertainment (for
recommendations and content creation), finance (for fraud detection
and algorithmic trading), and automotive (for self-driving cars).
• Top Applications of Deep Learning Across Industries
• Self Driving Cars
• News Aggregation and Fraud News Detection
• Natural Language Processing
• Virtual Assistants
• Entertainment
• Visual Recognition
• Fraud Detection
Who Uses Deep Learning?
• Top Applications of Deep Learning Across Industries
• Healthcare
• Automatic Handwriting Generation
• Personalisations
• Detecting Developmental Delay in Children
• Colourisation of Black and White images
• Adding sounds to silent movies
• Automatic Machine Translation
• Automatic Game Playing
• Language Translations
• Pixel Restoration
• Photo Descriptions
• Demographic and Election Predictions
• Deep Dreaming
• What can we do with deep learning?
• The three primary research areas of Deep Learning are usually applicable
o Image recognition
o Speech recognition, and
o Natural language processing.
• Training a neural network has three major steps.
o First, it does a forward pass and makes a prediction.
o Second, it compares the prediction to the ground truth using a loss function.
The loss function outputs an error value which is an estimate of how poorly the
network is performing.
o Last, it uses that error value to do back propagation which calculates the
gradients for each node in the network.
• The back propagation algorithm solves the following three primary difficulties in the training
process of the deep neural network (machine learning):
 Vanishing gradient--gradients (used to update weights) become very small as they
move backward through layers.
 Over fitting--The model performs very well on training data, but poorly on
unseen/test data.
 Computational load—huge computational power (especially with large datasets and
many layers). Training takes hours or days. High energy and hardware costs.
Why deep learning? Why now? Driving Factors
• Major technical forces are driving advances in machine learning (especially
deep learning). These forces work together to improve performance,
scalability, and real-world impact.
o The limitations of traditional (shallow) machine learning were a major
driving factor in the development and rise of deep learning.
o Datasets ---"Data is the fuel of machine learning.“, More and better-
quality data leads to better models. More data helps prevent overfitting
and enables deeper models to generalize better.
o Hardware---Faster, cheaper, and more specialized hardware enables
deeper and larger models. Examples such as GPUs, TPUs, NPUs for
training massive models; Parallel and distributed computing; Cloud
computing democratizes access to power.
o Algorithmic advances---Development of new architectures and training
techniques.
ML's limitations pushed us toward Deep Learning
• ML's limitations pushed us toward Deep Learning:
o Feature Engineering is Manual in ML-----Traditional ML (like SVM, logistic
regression) requires hand-crafted features. But Deep learning can
automatically learn features from raw data (images, text, audio).
o Limited Performance on Complex Data---ML struggles with high-
dimensional, nonlinear, or unstructured data. But Deep learning can
model complex functions using many nonlinear layers.
o Scalability Issues--Many ML algorithms don’t scale well with massive
datasets (big data). But Deep learning thrives on large datasets and
improves as data grows.
o Limited Representation Power---Shallow ML models are limited in what
they can represent. But Deep learning uses many layers to build
hierarchical representations.
o Less Flexibility for End-to-End Learning--Traditional ML often needs
separate stages: feature extraction → modeling → classification. Deep
learning enables end-to-end training, from raw input to final prediction.
16
What makes deep learning different
• It is based on experimental findings rather than by theory, that
algorithmic advances only become possible when appropriate data
and hardware were available to try new ideas (or scale up old ideas,
as is often the case).
• Machine learning isn’t mathematics or physics, where major
advances can be done with a pen and a piece of paper. It’s an
engineering science.
What makes deep learning different?
• It offered better performance on many problems
• Deep learning makes problem-solving much easier
o Since it is completely automates what used to be the most crucial
step in a machine-learning workflow
• A deep learning learns from data- incremental, layer-by-layer way in
which increasingly complex representations are developed and
17
Convolutional Neural Networks
• What is CNN ? Convolution Neural Network has input layer, output layer,
many hidden layers and millions of parameters that have the ability to learn
complex objects and patterns.
• It sub-samples the given input by convolution and pooling processes and is
subjected to activation function, where all of these are the hidden layers
which are partially connected and at last end is the fully connected layer
that results in the output layer. The output retains the original shape similar
to input image dimensions.
• Convolution:-Convolution is the process of sliding a small matrix (called a
filter or kernel) across the input image to detect features like edges,
textures, or patterns. Convolution + ReLU in CNN = Feature extraction
(Convolution) + Non-linear activation (ReLU) to highlight useful patterns and
enable deep learning.
• Filters / Kernels:-A small grid of numbers (weights) that slides over the input
data (image or feature map) to detect patterns.
 Small size compared to the input (e.g., 3×3, 5×5, 7×7).
 Shared across the input, meaning the same filter scans the whole image.
 Learnable, meaning its values are updated during training via backpropagation.
Convolutional Neural Networks

19
Convolutional Neural Networks
• Batch Normalization:-Batch normalization is generally done in between
convolution and activation(ReLU) layers. It normalizes the inputs at each
layer, reduces internal co-variate shift(change in the distribution of network
activations) and is a method to regularize a convolutional network.
• Batch normalizing allows higher learning rates that can reduce training time
and gives better performance. It allows learning at each layer by itself
without being more dependent on other layers. Batch Normalization is the
process of standardizing the outputs (activations) of a layer for each mini-
batch before passing them to the next layer.
• Padding and Stride:- Padding and Stride influence how convolution
operation is performed. Padding and stride can be used to alter the
dimensions(height and width) of input/output vectors either by increasing
or decreasing. Padding=border problem solver.
• Padding is used to make dimension of output equal to input by adding zeros
to the input frame of matrix. Padding allows more spaces for kernel to cover
image and is accurate for analysis of images. Due to padding, information
on the borders of images are also preserved similarly as at the center of
image. 21
Convolutional Neural Networks

22
Convolutional Neural Networks
• Stride controls how filter convolves over input i.e., the number of pixels
shifts over the input matrix. If stride is set to 1, filter moves across 1 pixel at
a time and if stride is 2, filter moves 2 pixels at a time. More the value of
stride, smaller will be the resulting output and vice versa.

23
Convolutional Neural Networks
• ReLU (Rectified Linear Unit) is a commonly used activation function in
neural networks, particularly in deep learning models. It's defined as a
function that outputs the input directly if it's positive, and zero otherwise.
This simple function helps networks learn complex patterns by introducing
non-linearity while remaining computationally efficient.
• ReLU helps to mitigate the vanishing gradient problem, which can occur in
deeper networks when using activation functions like sigmoid or tanh. The
vanishing gradient problem slows down or prevents learning in deeper
layers of the network.
• ReLU is computationally inexpensive because it only involves a simple
thresholding operation. This simplicity contributes to faster training times
compared to other activation functions like sigmoid or tanh

How it works:
• It is mathematically represented as : For any input value 'x', if 'x' is
positive, the ReLU function
outputs 'x' directly. If 'x' is
zero or negative, the ReLU
24 function outputs zero
Convolutional Neural Networks
• Pooling / Sub-sampling Layer:-Pooling layer operates on each feature map
independently. This reduces resolution of the feature map by reducing
height and width of features maps, but retains features of the map required
for classification. This is called Down-sampling.
• Pooling can be done in following ways :Max-pooling : It selects maximum
element from the feature map. The resulting max-pooled layer holds
important features of feature map. It is the most common approach as it
gives better results.
• Average pooling : It involves average calculation for each patch of the
feature map.

25
Why pooling is important ?
• It progressively reduces the spatial size of representation to reduce number
of parameters and computation in network and also controls overfitting. If
no pooling, then the output consists of same resolution as input.
• There can be many number of convolution, ReLU and pooling layers. Initial
layers of convolution learns generic information and last layers learn more
specific/complex features. After the final Convolution Layer, ReLU, Pooling
Layer the output feature map(matrix) will be converted into vector(one
dimensional array). This is called flatten layer. A flatten layer serves as a
crucial bridge between the convolutional/pooling layers and the fully
connected (dense) layers.
• Fully Connected Layer:-Fully connected layer looks like a regular neural
network connecting all neurons and forms the last few layers in the
network. The output from flatten layer is fed to this fully-connected layer.
• The feature vector from fully connected layer is further used to classify
images between different categories after training. All the inputs from this
layer are connected to every activation unit of the next layer. Since all the
parameters are occupied into fully-connected layer, it causes overfitting.
Dropout is one of the techniques
26 that reduces overfitting.
Convolutional Neural Networks
• Dropout is an approach used for regularization in neural networks. It is a
technique where randomly chosen nodes are ignored in network during
training phase at each stage.
• This dropout rate is usually 0.5 and dropout can be tuned to produce best
results and also improves training speed. This method of regularization
reduces node-to-node interactions in the network which leads to learning of
important features and also helps in generalizing new data better.
• Soft-Max Layer:-Soft-max is an activation layer normally applied to the last
layer of network that acts as a classifier. Classification of given input into
distinct classes takes place at this layer. The soft max function is used to
map the non-normalized output of a network to a probability distribution.
• The output from last layer of fully connected layer is directed to soft max
layer, which converts it into probabilities.
• Here soft-max assigns decimal probabilities to each class in a multi-class
problem, these probabilities sum equals 1.0.
• This allows the output to be interpreted directly as a probability.
27
CNN: Motivation
• Convolution leverages three important ideas that can help
improve a machine learning system:
o Sparse Interactions (or Sparse Connectivity/Local Connectivity)
o parameter sharing or weight sharing
o equivariant representation

• Moreover, convolution provides a means for working with


inputs of variable size.
• Traditional neural network layers use matrix multiplication
by a matrix of parameters
• Every output unit interacts with every input unit

28
CNN: Motivation
• Sparse Interactions (or Sparse Connectivity/Local Connectivity)
• Motivation: In traditional fully connected neural networks, every input unit
is connected to every output unit in the next layer. For high-dimensional
inputs like images, this leads to an explosion of parameters, making training
computationally expensive and prone to overfitting.
• How CNNs address it: CNNs leverage the understanding that features in
images are often localized. A neuron detecting an edge in one corner of an
image doesn't need to "see" pixels from the opposite corner to do its job.
Instead, each neuron (in a feature map) is only connected to a small, local
region of the input from the previous layer, known as its receptive field. This
drastically reduces the number of connections and, consequently, the
number of parameters.
• Benefit: This sparsity makes the model more efficient (fewer computations)
and helps in learning local, relevant features without being overwhelmed by
irrelevant global information.

29
CNN: Motivation
• Parameter Sharing (or Weight Sharing):
• Motivation: If a feature (like a vertical edge) is important and appears in one
part of an image, it's likely that the same feature could be important if it
appears in another part of the image. In a traditional network, a separate
set of weights would be needed to detect that same feature at every
different location, again leading to an excessive number of parameters.
• How CNNs address it: In a convolutional layer, the same set of weights (the
"filter" or "kernel") is applied across the entire input feature map. This
means that a single learned filter can detect its specific pattern (e.g., a
diagonal line, a specific texture) regardless of where it appears in the input
image.
• Benefit: This drastically reduces the number of unique parameters the
network needs to learn. Instead of learning thousands of different edge
detectors for different locations, it learns one and applies it everywhere.
This leads to much more efficient learning, less memory usage, and better
generalization.

30
CNN: Motivation
• Equivariant Representation (specifically, Translation Equivariance):
• Motivation: a key challenge in image recognition is that the position of an
object or feature can vary within an image. We want our model to recognize
a cat whether it's in the top-left or bottom-right. Traditional networks
struggle with this without extensive data augmentation.
• How CNNs address it: Parameter sharing is the fundamental mechanism
that leads to translation equivariance. If a filter detects a feature at a certain
location, and that feature shifts in the input, the same filter will detect it at
the new, shifted location, resulting in a shifted activation in the output
feature map. The output changes in a predictable way as the input shifts.
• Pooling layers (especially max pooling) then build upon this to provide a
degree of translation invariance. While convolution provides equivariance
(output shifts with input), pooling makes the final representation somewhat
invariant to small shifts, meaning the presence of a feature is registered
regardless of its precise sub-region location within the pooling window.

31
CNN: Motivation
• Benefit: This property makes CNNs robust to the exact positioning of
features within an image, making them much more effective for object
recognition tasks where objects can appear anywhere in the frame.
• Inputs of Variable Size (mentioned in red):
Motivation: While not a core motivation for the existence of CNNs themselves,
the ability to handle variable-sized inputs is a significant practical advantage of
convolutional and pooling operations.
• How CNNs handle it: Convolutional layers, by their nature of sliding filters,
can be applied to inputs of varying spatial dimensions. The output feature
map size will simply scale proportionally. Pooling layers also operate locally
and can adapt to different input sizes.
• Why it's a motivation/benefit: This is a huge practical advantage over fully connected layers
which require fixed-size inputs. It means you don't always have to resize all your images to a
uniform dimension before feeding them into the network's convolutional base, offering
more flexibility in dataset preparation and network design. However, it's important to note
that if you append fully connected layers after the convolutional base, you'll still need a
fixed-size input to those layers (often achieved via global pooling or by resizing before the
final flatten/dense layers).
32
CNN basically has two main parts such as feature extraction (learning)
layer and Classification layer as follows.
Max pooling is the most common and widely used type of pooling layer.
CNN: Motivation
• Densely or Fully connected neural network

42
The whole CNN

cat dog ……
Convolution

Max Pooling
Can repeat
Fully Connected many
Feedforward network
Convolution times

Max Pooling

Flattened
Parameters that influence model training
• Convolutional Neural Network (CNN) are distinct parameters that influence
model training and performance.
Number of Epoch for CNN Training:-An epoch represents one complete pass
through the entire training dataset during the training process. In each epoch,
the model processes all training examples, performs forward and backward
passes, and updates its weights. The number of epochs determines how many
times the model sees and learns from the entire dataset.
Number of Hidden Layers: The number of hidden layers, and the number of
neurons or filters within them, determines the model's capacity to learn
complex features and representations from the data.
Number of Layers and Neurons: The architecture of the CNN (number of layers
and neurons per layer) significantly impacts performance. Deeper networks
can capture more complex patterns, but they are also more prone to
overfitting.
Number of Filters:- The number of filters affects the depth of the output. For
example, three distinct filters would yield three different feature maps,
creating a depth of three.
Parameters that influence model training
• Stride is the distance, or number of pixels, that the kernel moves over the
input matrix. While stride values of two or greater is rare, a larger stride
yields a smaller output.
• Zero-padding is usually used when the filters do not fit the input image. This
sets all elements that fall outside of the input matrix to zero, producing a
larger or equally sized output. There are three types of padding:
o Valid padding: This is also known as no padding. In this case, the last convolution is
dropped if dimensions do not align.
o Same padding: This padding ensures that the output layer has the same size as the
input layer.
o Full padding: This type of padding increases the size of the output by adding zeros to
the border of the input.

45
Parameters that influence model training
• The learning rate in CNN training determines the step size during weight
updates. Finding the right learning rate is crucial for efficient training. A
learning rate that's too high can lead to instability and prevent convergence,
while a learning rate that's too low can result in slow training and
potentially get stuck in local minima.
• Batch Size: Determines the number of training samples used in each
iteration. Larger batch sizes can lead to faster training and more stable
updates, but may require more memory. Smaller batch sizes can capture
finer details but may result in more noisy updates.
• Optimizer: The optimization algorithm used to update the model's weights.
Stochastic Gradient Descent (SGD), Adam, RMSprop are common choices,
each with its own characteristics and performance.
• Regularization (e.g., Dropout): Techniques like dropout help prevent
overfitting by randomly dropping out neurons during training. The optimal
dropout rate is another hyperparameter that needs to be tuned.
• Data Augmentation: Techniques like rotations, flips, and crops can artificially
increase the size and diversity of the training data, improving generalization.
46
• Summary
• Input of image data into the convolution neural network, which is processed with
help of pixel values of the image in convolution layer.
• Filters are generated that performs convolutions over entire image and trains the
network to identify and learn features from image, which are converted to
matrices.
• Batch normalization of input vectors is performed at each layer, so as to ensure all
input vectors are normalized and hence regularization in network is attained.
• The convolutions are performed until better accuracy has attained and maximum
feature extraction is done.
• Convolutions results in sub-sampling of image and dimensions of input gets
changed according to padding and stride chosen.
• Each convolution follows activation layer(ReLU) and pooling layer, which brings in
non-linearity and helps in sub sampling respectively.
• After the final convolution, the input matrix is converted to feature vector. This feature
vector is the flattened layer.
• Feature vector serves as input to next layer(fully connected layer), where all features are
collectively transferred into this network. Dropout of random nodes occurs during training to
reduce overfitting in this layer.
• Finally, the raw values which are predicted output by network are converted to probabilistic
47
Performance evaluation metrics
• Performance evaluation metrics are crucial for assessing the
effectiveness of classification models.
• Accuracy, precision, recall, and F1-score are common metrics, while
the confusion matrix, ROC curve, and AUC provide more detailed
insights into model performance, especially in imbalanced datasets.
• Accuracy is a fundamental metric used for evaluating the
performance of a classification model. It tells us the proportion of
correct predictions made by the model out of all predictions.

• Precision
• It measures how many of the positive predictions made by the model
are actually correct. It's useful when the cost of false positives is high
such as in medical diagnoses where predicting a disease when it’s not
present can have serious consequences.

48
Performance evaluation metrics
• Recall or Sensitivity measures how many of the actual positive cases
were correctly identified by the model. It is important when missing a
positive case (false negative) is more costly than false positives.

• F1 Score
• The F1 Score is the harmonic mean of precision and recall. It is useful
when we need a balance between precision and recall as it combines
both into a single number. A high F1 score means the model performs
well on both metrics. Its range is [0,1].
• Lower recall and higher precision gives us great accuracy but then it
misses a large number of instances. More the F1 score better will be
performance. It can be expressed mathematically in this way:

49
Performance evaluation metrics
Area Under Curve (AUC) and ROC Curve
• It is useful for binary classification tasks. The AUC value represents
the probability that the model will rank a randomly chosen positive
example higher than a randomly chosen negative example. AUC
ranges from 0 to 1 with higher values showing better model
performance.
• True Positive Rate(TPR)
• Also known as sensitivity or recall, the True Positive Rate measures
how many actual positive instances were correctly identified by the
model. It answers the question: "Out of all the actual positive cases,
how many did the model correctly identify?"
• Formula:

50
Performance evaluation metrics
• ROC Curve
• It is a graphical representation of the True Positive Rate (TPR) vs the
False Positive Rate (FPR) at different classification thresholds. The
curve helps us visualize the trade-offs between sensitivity (TPR) and
specificity (1 - FPR) across various thresholds. Area Under Curve
(AUC) quantifies the overall ability of the model to distinguish
between positive and negative classes.
• AUC = 1: Perfect model (always correctly classifies positives and
negatives).
• AUC = 0.5: Model performs no better than random guessing.
• AUC < 0.5: Model performs worse than random guessing (showing
that the model is inverted).

51
Performance evaluation metrics
• ROC Curve

ROC Curve for Evaluation of Classification Models

52
Performance evaluation metrics
• Confusion matrix creates a N X N matrix, where N is the number of classes
or categories that are to be predicted. Here we have N = 2, so we get a 2 X 2
matrix. Suppose there is a problem with our practice which is a binary
classification. Samples of that classification belong to either Yes or No. So,
we build our classifier which will predict the class for the new input sample.
After that, we tested our model with 165 samples and we get the following
result.

53
End of Cha 4 & 5

You might also like