0% found this document useful (0 votes)

39 views20 pages

Module 02

Module 02 introduces key concepts in deep learning, focusing on Feedforward Neural Networks (FNNs) and their architectures, including Single-Layer Perceptrons and Multi-Layer Perceptrons. It covers essential components such as forward and backward propagation, the importance of backpropagation for efficient weight updates, and challenges like convergence and local minima in optimization. Additionally, it discusses various optimization algorithms, including Gradient Descent and its advanced variants, along with real-world applications of these techniques.

Uploaded by

yoxisam356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views20 pages

Module 02

Uploaded by

yoxisam356

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Module 02: Introduction to Deep Learning

Contents:
Feedforward Neural Networks, Backpropagation, EBPTA, Convergence and local minima,
Gradient Descent (GD), Momentum Based GD, Nesterov Accelerated GD, Stochastic GD,
AdaGrad, RMSProp

Feedforward Neural Networks (FNNs)

Definition:

A Feedforward Neural Network (FNN) is a type of artificial neural network where the flow
of information is strictly unidirectional—from input to output. It is one of the simplest and
most widely used architectures in deep learning. There are no loops or cycles in the network,
and each layer processes and passes information to the next.

Architectures of Feedforward Neural Networks

1. Single-Layer Perceptron (SLP):

o Consists of only one input layer and one output layer.

o Can solve linearly separable problems using an activation function (e.g., step
or sigmoid).

o Limitation: Cannot solve non-linear problems (e.g., XOR problem).

2. Multi-Layer Perceptron (MLP):

o Has an input layer, one or more hidden layers, and an output layer.

o Hidden layers use activation functions like ReLU, Sigmoid, Tanh to model
non-linearity.

o MLPs can solve both linear and non-linear problems.

3. Deep Neural Network (DNN):

o An extension of MLP with a large number of hidden layers.

o Typically used for complex tasks like image recognition, speech processing,
and text analysis.

Key Components of FNNs

1. Input Layer:

o Takes features (data points) as input.

o Example: For an image, the pixel values serve as input features.

2. Hidden Layers:
o Perform computations using weights, biases, and activation functions.

o Example: Transform input features into higher-dimensional spaces. 3. Output

Layer:
o Produces the final predictions or decisions (e.g., classification or regression).
4. Weights and Biases:

o Weights determine the strength of the connection between neurons. o

Biases help shift the activation function, enabling the model to learn better. 5.
Activation Functions:

o Introduce non-linearity to the network.

o Examples: Sigmoid, Tanh, ReLU, Softmax.

Working of Feedforward Neural Networks

1. Forward Propagation:

o The input passes through each layer.

o Each layer applies weights, biases, and activation functions to transform the
data.

2. Output Calculation:

o Final predictions are generated based on the output layer's computations.

3. Error Calculation:

o Compares the predicted output to the actual output using a loss function (e.g.,
Mean Squared Error or Cross-Entropy Loss).

4. Training (using Backpropagation):

o Weights and biases are updated iteratively to minimize the error using
optimization algorithms (e.g., Gradient Descent).

Diagram of Feedforward Neural Networks

Structure:

• Input Layer → Hidden Layers → Output Layer.

• Arrows show data flow from one layer to another.

You can visualize:

• Circles for neurons (nodes).

• Arrows for connections between neurons with weights.

• Different layers separated vertically.
Applications of Feedforward Neural Networks
1. Image Recognition:

o Used in computer vision tasks like facial recognition and object detection.
o Example: Classifying handwritten digits in the MNIST dataset.

2. Speech Recognition:

o Convert audio signals into text.

o Example: Virtual assistants like Siri or Alexa.

3. Natural Language Processing (NLP):

o Sentiment analysis, machine translation, and text classification.

o Example: Spam detection in emails.

4. Regression Tasks:

o Predict continuous values like stock prices or weather conditions.

o Example: House price prediction based on features like area and location.
5. Pattern Recognition:

o Identify patterns in medical images (e.g., detecting tumors).

o Example: MRI and X-ray analysis.

6. Recommendation Systems:

o Suggest products or content based on user preferences.

o Example: Netflix recommending shows.

Examples of Feedforward Neural Networks

1. Single-Layer Perceptron:

o Task: Predict whether an email is spam or not.

o Input: Email features like word count, sender domain, etc.

o Output: Spam (1) or Not Spam (0).

2. Multi-Layer Perceptron:

o Task: Classify handwritten digits (0-9).

o Input: Pixel values of images.

o Hidden Layers: Extract features like edges, shapes, etc.

o Output: Predicted digit.

3. Deep Neural Network:

o Task: Detect faces in an image.

o Input: Pixel matrix of the image.

o Hidden Layers: Learn features like edges, facial structures, and patterns.
o Output: Location and identity of faces in the image.
Backpropagation : is a powerful algorithm in deep learning, primarily used to train artificial
neural networks, particularly feed-forward networks. It works iteratively, minimizing the
cost function by adjusting weights and biases.

In each epoch, the model adapts these parameters, reducing loss by following the error
gradient. Backpropagation often utilizes optimization algorithms like gradient descent or
stochastic gradient descent. The algorithm computes the gradient using the chain rule from
calculus, allowing it to effectively navigate complex layers in the neural network to minimize
the cost function.
Why is Backpropagation Important?
Backpropagation plays a critical role in how neural networks improve over time. Here's why:

1. Efficient Weight Update: It computes the gradient of the loss function with respect
to each weight using the chain rule, making it possible to update weights efficiently.

2. Scalability: The backpropagation algorithm scales well to networks with multiple

layers and complex architectures, making deep learning feasible.

3. Automated Learning: With backpropagation, the learning process becomes

automated, and the model can adjust itself to optimize its performance.

Working of Backpropagation Algorithm

The Backpropagation algorithm involves two main steps: the Forward Pass and
the Backward Pass.
How Does the Forward Pass Work?

In the forward pass, the input data is fed into the input layer. These inputs, combined with
their respective weights, are passed to hidden layers.

For example, in a network with two hidden layers (h1 and h2 as shown in Fig. (a)), the output
from h1 serves as the input to h2. Before applying an activation function, a bias is added to
the weighted inputs.

Each hidden layer applies an activation function like ReLU (Rectified Linear Unit), which
returns the input if it’s positive and zero otherwise. This adds non-linearity, allowing the
model to learn complex relationships in the data. Finally, the outputs from the last hidden
layer are passed to the output layer, where an activation function, such as softmax, converts
the weighted outputs into probabilities for classification.
The forward pass using weights and biases

How Does the Backward Pass Work?

In the backward pass, the error (the difference between the predicted and actual output) is
propagated back through the network to adjust the weights and biases. One common method
for error calculation is the Mean Squared Error (MSE), given by:

MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2

Once the error is calculated, the network adjusts weights using gradients, which are
computed with the chain rule. These gradients indicate how much each weight and bias
should be adjusted to minimize the error in the next iteration. The backward pass continues
layer by layer, ensuring that the network learns and improves its performance. The activation
function, through its derivative, plays a crucial role in computing these gradients during
backpropagation.
Error Backpropagation Through Activation (EBPTA)

Introduction

Error Backpropagation Through Activation (EBPTA) is an essential extension of the standard

Backpropagation Algorithm used to train artificial neural networks. It is specifically
designed to enhance how errors are propagated through different activation functions in
multi-layer perceptrons (MLPs).

In traditional Backpropagation, the error is propagated backward through the network using
partial derivatives of the loss function with respect to weights and biases. However, in
EBPTA, additional focus is given to how activation functions influence the error
propagation, ensuring an optimized weight update process.

Steps of EBPTA
Differences Between Standard Backpropagation and EBPTA
Feature Standard EBPTA
Backpropagation
Error Propagation Adjusts based on activation functions
Uses simple derivatives
Activation Handles activation functions
Functions May cause vanishing more effectively
gradients
Weight Updates
Direct gradient-b
Activation-a Faster due to better error flow

Speed Slower for deep

Applications of EBPTA

• Speech Recognition (e.g., Siri, Google Assistant)

• Image Classification (e.g., Face Recognition)

• Autonomous Vehicles (e.g., Tesla’s AI models)

• MedicalDiagnosis (e.g., Disease Prediction)

Convergence and Local Minima in Deep Learning

In deep learning, convergence refers to the process where the optimization algorithm (e.g.,
Gradient Descent) minimizes the loss function, improving the model’s ability to make
accurate predictions. Local minima are points where the loss function has a lower value than
its immediate surroundings but is not necessarily the absolute lowest (global minimum).

Understanding the Loss Function and Optimization

•A loss function quantifies the error between predicted and actual values.

• Optimization algorithms aim to minimize this loss function by adjusting the model’s
parameters (weights and biases).

• The optimization process involves moving in the direction where the loss decreases,
ideally reaching the global minimum (the lowest possible loss).

Local Minima vs. Global Minima

• Local Minimum: A point where the loss function is lower than nearby points but not
necessarily the lowest across the entire function.
• Global Minimum: The absolute lowest value of the loss function across the entire
search space.

• Saddle Points: Points where the gradient is zero, but they are neither local minima
nor maxima.

Challenges in Convergence Due to Local Minima

• Deep learning models often have highly complex loss landscapes with multiple local
minima and saddle points.

• If an optimization algorithm gets stuck in a local minimum, it might not find the best
possible model parameters.

• Some models might suffer from slow convergence, requiring advanced optimization
techniques.

Factors Affecting Convergence

• Learning Rate: If too high, the model may oscillate and never converge; if too low, it
may take too long to reach the minimum.

• Initialization of Weights: Poor initialization may cause slow convergence or getting

stuck in local minima.

• Choice of Activation Function: Some activation functions (e.g., Sigmoid) can cause
vanishing gradients, leading to slow learning.

• Optimization Algorithm Used: Different optimizers (SGD, Adam, RMSProp, etc.)

influence the rate and stability of convergence.
Techniques to Overcome Local Minima

1. Momentum-Based Gradient Descent: Uses past gradients to accelerate learning.

2. Nesterov Accelerated Gradient Descent (NAG): A variant of momentum that better

anticipates updates.

3. Adaptive Learning Rate Methods:

o AdaGrad: Adjusts learning rate based on past gradients.

o RMSProp: Divides learning rate by a moving average of past gradients.

o Adam (Adaptive Moment Estimation): Combines momentum and RMSProp

for faster convergence.

4. Batch Normalization: Normalizes input layers to speed up training.

5. Dropout Regularization: Helps escape local minima by randomly dropping neurons

during training.

6. Early Stopping: Prevents overfitting and stops training when loss no longer
decreases significantly.

(A graph of a loss function with multiple dips, showing global and local minima, and
saddle points.)
Gradient Descent (GD) and Its Types

Gradient Descent (GD) is an optimization algorithm used to minimize the loss function in
machine learning and deep learning models by adjusting the model’s parameters (weights and
biases). It works by computing the gradient (derivative) of the loss function and updating the
parameters in the direction that reduces the error.
which is completely incorrect!
Using these formulas, we compute new values for mmm and ccc iteratively.
Real-World Applications of Gradient Descent

1. Stock Price Prediction

o Companies like Bloomberg and Goldman Sachs use GD to predict stock

market trends.

2. Self-Driving Cars

o Tesla and Waymo use GD in Neural Networks for lane detection and obstacle
recognition.

3. Medical Diagnosis
o AI models in hospitals use GD to detect cancer and diseases from MRI and CT
scans.

4. Chatbots & NLP

o Siri and Google Assistant use GD in training NLP models for voice
recognition.
Types of Gradient Descent
Advanced Variants of Gradient Descent/ Accelerated GD

The two most used accelerated gradient descent methods are:

1. Momentum-Based GD

2. Nesterov Accelerated Gradient (NAG)

Intro to Feed Forward Neural Networks
No ratings yet
Intro to Feed Forward Neural Networks
41 pages
Working of Multi-Layer Perceptron
No ratings yet
Working of Multi-Layer Perceptron
16 pages
Shortnotedeeplearning
No ratings yet
Shortnotedeeplearning
11 pages
Unit 4
No ratings yet
Unit 4
19 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
32 pages
ANN Research
No ratings yet
ANN Research
18 pages
Chapter 05
No ratings yet
Chapter 05
25 pages
ML Unit 5
No ratings yet
ML Unit 5
34 pages
Unit 5 ML
No ratings yet
Unit 5 ML
37 pages
Mod 4 Notes
No ratings yet
Mod 4 Notes
46 pages
Ca 3 DL
No ratings yet
Ca 3 DL
6 pages
ML Exp 8
No ratings yet
ML Exp 8
2 pages
Unit-1 NN
No ratings yet
Unit-1 NN
12 pages
Notes On Introduction To Deep Learning
No ratings yet
Notes On Introduction To Deep Learning
19 pages
Unit 2 Deep Learning
No ratings yet
Unit 2 Deep Learning
19 pages
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
No ratings yet
An Introduction To Neural Networks: Instituto Tecgraf PUC-Rio Nome: Fernanda Duarte Orientador: Marcelo Gattass
45 pages
Unit 1
No ratings yet
Unit 1
20 pages
ML Unit-5
No ratings yet
ML Unit-5
11 pages
Supervised Learning Network Introduction: Unit 2
No ratings yet
Supervised Learning Network Introduction: Unit 2
52 pages
Unit 1
No ratings yet
Unit 1
16 pages
Chapter 2 - Artificial Neural Networks
No ratings yet
Chapter 2 - Artificial Neural Networks
19 pages
Artificial Intelligence - Chapter 7
No ratings yet
Artificial Intelligence - Chapter 7
18 pages
Unit II
No ratings yet
Unit II
56 pages
Understanding Feed Forward Neural Networks in Deep Learning
No ratings yet
Understanding Feed Forward Neural Networks in Deep Learning
10 pages
Neural Network Essentials
No ratings yet
Neural Network Essentials
34 pages
Backpropagation & Neural Networks
No ratings yet
Backpropagation & Neural Networks
30 pages
2012-1158. Backpropagation NN
No ratings yet
2012-1158. Backpropagation NN
56 pages
Backpropagation in Neural Networks
No ratings yet
Backpropagation in Neural Networks
56 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
No ratings yet
Understanding Multi-Layer Feed-Forward Neural Networks in Machine Learning
4 pages
Types of MAC Protocols
No ratings yet
Types of MAC Protocols
16 pages
Unit 1
No ratings yet
Unit 1
19 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
16 pages
Single Neuron Model
No ratings yet
Single Neuron Model
16 pages
Ann 4
No ratings yet
Ann 4
15 pages
DL&A
No ratings yet
DL&A
24 pages
Deep Learning Artificial Neural Network (ANN)
No ratings yet
Deep Learning Artificial Neural Network (ANN)
9 pages
Unit2 3 Notes
No ratings yet
Unit2 3 Notes
34 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Neural Network
No ratings yet
Neural Network
7 pages
4 Perceptron 06 08 2025
No ratings yet
4 Perceptron 06 08 2025
32 pages
978-3-030-41068-1 (1) - 133-188
No ratings yet
978-3-030-41068-1 (1) - 133-188
56 pages
Week 03-04 - Deep Feedforward Networks - Intro
No ratings yet
Week 03-04 - Deep Feedforward Networks - Intro
141 pages
Unit 3
No ratings yet
Unit 3
7 pages
Chapter-4 Fundamental of Neural Network
No ratings yet
Chapter-4 Fundamental of Neural Network
26 pages
ISP560 Notes
No ratings yet
ISP560 Notes
139 pages
Lecture-17 Machine Learning With Python
No ratings yet
Lecture-17 Machine Learning With Python
37 pages
UNIT - 4 ML
No ratings yet
UNIT - 4 ML
45 pages
Multi-Layer Perceptron: Components
No ratings yet
Multi-Layer Perceptron: Components
9 pages
AI Assignment 5
No ratings yet
AI Assignment 5
13 pages
Unit 3
No ratings yet
Unit 3
8 pages
Unit 2
No ratings yet
Unit 2
20 pages
Deep Learning Modeule V01
No ratings yet
Deep Learning Modeule V01
70 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Gen Ai Mynotes
No ratings yet
Gen Ai Mynotes
12 pages
UNIT 3 - Backpropagation Algorithm
No ratings yet
UNIT 3 - Backpropagation Algorithm
38 pages
Za HL 368 Big Book Original in This Together Ver 2
No ratings yet
Za HL 368 Big Book Original in This Together Ver 2
26 pages
Emerging Issues of Procurement
No ratings yet
Emerging Issues of Procurement
19 pages
Grade 9 Math Curriculum Guide
No ratings yet
Grade 9 Math Curriculum Guide
2 pages
Empowering Girls Through Selfies
No ratings yet
Empowering Girls Through Selfies
3 pages
Sample Rubrics: Graphing Rubric 1
100% (1)
Sample Rubrics: Graphing Rubric 1
2 pages
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
No ratings yet
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
6 pages
Classical and Marginal Economics Overview
100% (1)
Classical and Marginal Economics Overview
5 pages
Atomic Structure
No ratings yet
Atomic Structure
18 pages
Grade 4 DLL English 4 q3 Week 5
No ratings yet
Grade 4 DLL English 4 q3 Week 5
5 pages
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
No ratings yet
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
10 pages
LKPD Let Me Introduce Myself
100% (1)
LKPD Let Me Introduce Myself
10 pages
Ohms Law 14to16 Lesson-Plan
No ratings yet
Ohms Law 14to16 Lesson-Plan
3 pages
SHP 2 Grid
No ratings yet
SHP 2 Grid
7 pages
Unit-1 Feature Point of View Types of Os
No ratings yet
Unit-1 Feature Point of View Types of Os
5 pages
Buy Ebook The Routledge Handbook of Landscape Ecology 1st Edition Robert A. Francis Cheap Price
100% (11)
Buy Ebook The Routledge Handbook of Landscape Ecology 1st Edition Robert A. Francis Cheap Price
41 pages
Roberts and Lamp - Geoeconomics Narrative
No ratings yet
Roberts and Lamp - Geoeconomics Narrative
21 pages
CSEC EDPM CoverSheetForESBA V02 Fillable
No ratings yet
CSEC EDPM CoverSheetForESBA V02 Fillable
1 page
Action Plan AP
No ratings yet
Action Plan AP
3 pages
Compressor: Dynamic Compressors Centrifugal Compressors
100% (1)
Compressor: Dynamic Compressors Centrifugal Compressors
7 pages
A Film by Cristian Mungiu: A Mobra Films Production
No ratings yet
A Film by Cristian Mungiu: A Mobra Films Production
9 pages
TechCalc Thermal Software Guide
No ratings yet
TechCalc Thermal Software Guide
108 pages
CONFIGURATION UPLOAD LSMW
No ratings yet
CONFIGURATION UPLOAD LSMW
31 pages
Mind, Language and Society Philosophy in The Real World
No ratings yet
Mind, Language and Society Philosophy in The Real World
189 pages
Influence of NLP On Sales
0% (1)
Influence of NLP On Sales
25 pages
Linked Lists: Concepts and Operations
No ratings yet
Linked Lists: Concepts and Operations
13 pages
Action Research in Education Innovation
No ratings yet
Action Research in Education Innovation
80 pages
Mahesh CV
No ratings yet
Mahesh CV
6 pages
Quick Bill Summary: Change To Your Service
No ratings yet
Quick Bill Summary: Change To Your Service
1 page
LG Oem Lgit Plde-P017a SCH
No ratings yet
LG Oem Lgit Plde-P017a SCH
2 pages
UEME3112 Fluid Mechanics II May 2019 CFD Assignment: Laminar Pipe Flow
No ratings yet
UEME3112 Fluid Mechanics II May 2019 CFD Assignment: Laminar Pipe Flow
18 pages

Module 02

Uploaded by

Module 02

Uploaded by

Module 02: Introduction to Deep Learning

Feedforward Neural Networks (FNNs)

Architectures of Feedforward Neural Networks

1. Single-Layer Perceptron (SLP):

o Consists of only one input layer and one output layer.

o Limitation: Cannot solve non-linear problems (e.g., XOR problem).

o MLPs can solve both linear and non-linear problems.

3. Deep Neural Network (DNN):

o An extension of MLP with a large number of hidden layers.

Key Components of FNNs

o Takes features (data points) as input.

o Example: For an image, the pixel values serve as input features.

o Example: Transform input features into higher-dimensional spaces. 3. Output

o Weights determine the strength of the connection between neurons. o

o Introduce non-linearity to the network.

o Examples: Sigmoid, Tanh, ReLU, Softmax.

Working of Feedforward Neural Networks

o The input passes through each layer.

o Final predictions are generated based on the output layer's computations.

4. Training (using Backpropagation):

Diagram of Feedforward Neural Networks

• Input Layer → Hidden Layers → Output Layer.

• Arrows show data flow from one layer to another.

You can visualize:

• Circles for neurons (nodes).

• Arrows for connections between neurons with weights.

o Convert audio signals into text.

o Example: Virtual assistants like Siri or Alexa.

3. Natural Language Processing (NLP):

o Sentiment analysis, machine translation, and text classification.

o Example: Spam detection in emails.

o Predict continuous values like stock prices or weather conditions.

o Identify patterns in medical images (e.g., detecting tumors).

o Example: MRI and X-ray analysis.

o Suggest products or content based on user preferences.

Examples of Feedforward Neural Networks

o Task: Predict whether an email is spam or not.

o Output: Spam (1) or Not Spam (0).

o Task: Classify handwritten digits (0-9).

o Input: Pixel values of images.

o Hidden Layers: Extract features like edges, shapes, etc.

o Output: Predicted digit.

3. Deep Neural Network:

o Task: Detect faces in an image.

o Input: Pixel matrix of the image.

2. Scalability: The backpropagation algorithm scales well to networks with multiple

3. Automated Learning: With backpropagation, the learning process becomes

Working of Backpropagation Algorithm

How Does the Backward Pass Work?

MSE=(Predicted Output−Actual Output)2MSE=(Predicted Output−Actual Output)2

Error Backpropagation Through Activation (EBPTA) is an essential extension of the standard

Speed Slower for deep

• Speech Recognition (e.g., Siri, Google Assistant)

• Image Classification (e.g., Face Recognition)

• Autonomous Vehicles (e.g., Tesla’s AI models)

• MedicalDiagnosis (e.g., Disease Prediction)

Understanding the Loss Function and Optimization

Local Minima vs. Global Minima

Challenges in Convergence Due to Local Minima

Factors Affecting Convergence

• Initialization of Weights: Poor initialization may cause slow convergence or getting

• Optimization Algorithm Used: Different optimizers (SGD, Adam, RMSProp, etc.)

1. Momentum-Based Gradient Descent: Uses past gradients to accelerate learning.

2. Nesterov Accelerated Gradient Descent (NAG): A variant of momentum that better

3. Adaptive Learning Rate Methods:

o AdaGrad: Adjusts learning rate based on past gradients.

o RMSProp: Divides learning rate by a moving average of past gradients.

o Adam (Adaptive Moment Estimation): Combines momentum and RMSProp

4. Batch Normalization: Normalizes input layers to speed up training.

5. Dropout Regularization: Helps escape local minima by randomly dropping neurons

1. Stock Price Prediction

o Companies like Bloomberg and Goldman Sachs use GD to predict stock

4. Chatbots & NLP

The two most used accelerated gradient descent methods are:

2. Nesterov Accelerated Gradient (NAG)

You might also like