0 ratings0% found this document useful (0 votes) 38 views58 pagesDeep Learning Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Machine Learning
Gm \
Input Foature extraction Classification ‘Output
input Feature extraction * Classification Output
What is Deep Learning?
Deep learning is a machine learning technique that learns feature and task directly from data
awhere data may be image,text,or sound.
Deep learning is a method in artificial intelligence (Al) that teaches computers to process data in
a way that is inspired by the human brain.
What are the uses of deep learning?
DL use in Automotive, aerospace, manufacturing, electronics, medical research, and other
fields.
© Self-driving cars use deep learning models to automatically detect road signs and
pedestrians.
© Defense systems use deep learning .
* Medical image analysis - uses deep learning to automatically detect cancer cells for
medical diagnosis.
* Factories use deep learning applications to automatically detect when people or
objects are within an unsafe distance of machines.
How does deep learning work?
Deep learning algorithms are neural networks that are modeled after the human brain. For
example, a human brain contains millions of interconnected neurons that work together to
learn and process information. Similarly, deep learning neural networks, or artificial neural
networks, are made of many layers of artificial neurons that work together inside the computer.
Artificial neurons are software modules called nodes, which use mathematical calculations to
process data. Artificial neural networks are deep learning algorithms that use these nodes to
solve complex problems.What are the components of a deep learning network :-
Input layer :- An artificial neural network has several nodes that input data into it.
* Hidden layer :- The input layer processes and passes the data to layers further in the
neural network. These hidden layers process information at different levels .
© Output layer :- The output layer consists of the nodes that output the data.
Taput Laver Hidden Layer #i| [Hidden Layer #2 ‘Output Layer
What is deep learning in the context of machine leart
iB
machine learning methods require significant human effort to train the software.
Example, in animal image recognition
Manually label hundreds of thousands of animal images.
Make the machine learning algorithms process those images.
Test those algorithms on a set of unknown images.
Identify why some results are inaccurate.
Improve the dataset by labeling new images to improve result accuracy.
This process is called supervised learning.
What are the benefits of deep learning over machine learning :
Machine learning methods find unstructured data, such as text documents, challenging to
process because the training dataset can have infinite variations. On the other hand, deep
learning models can comprehend unstructured data and make general observations without
manual feature extraction.
For Example , a neural network can recognize that these two different input sentences have the
same meaning:
* Can you tell me how to make the payment?
© How do | transfer money?
L.M.KuwarA Brief History of Deep Learning :-
Deep Learning, is a more evolved branch of machine learning, and uses layers of algorithms to
process data, and imitate the thinking process, or to develop abstractions.
It is often used to visually recognize objects and understand human speech. Information is
passed through each layer, with the output of the previous layer providing input for the next
layer. The first layer in a network is called the input layer, while the last is called an output layer.
All the layers between input and output are referred to as hidden layers. Each layer is typically a
simple, uniform algorithm containing one kind of activation function.
Feature extraction is another aspect of deep learning. It is used for pattern recognition and
image processing. Feature extraction uses an algorithm to automatically construct meaningful
“features” of the data for purposes of training, learning, and understanding. Normally a data
scientist, or a programmer, is responsible for feature extraction.
The history of deep learning can be traced back to 1943, when Walter Pitts and Warren
McCulloch created a computer model based on the neural networks of the human brain.
They used a combination of algorithms and mathematics they called “threshold logic” to mimic
the thought process. Since that time, Deep Learning has evolved steadily, with only two
significant breaks in its development.
1, 1943: Warren McCulloch and Walter Pitts created a computer model based on the
neural networks of the human brain, laying the groundwork for neural network
theory1.
2. 1958: Frank Rosenblatt developed the Perceptron, an early type of artificial neural
network that could learn to recognize patterns.
3. 1960s: The basics of backpropagation, a method for training neural networks, were
developed by Henry J. Kelley and later refined by others.
4. 1970s: Kunihiko Fukushima introduced the first convolutional neural network
(CNN), which is now widely used in image recognition.
5. 1980s: The term “deep learning” was introduced by Rina Dechter in 1986, and
significant advancements were made in neural network training techniques .
6. 2006: Geoffrey Hinton and his team made breakthroughs in training deep neural
networks, leading to a resurgence in interest and research in deep learning .7. 2012: The success of deep learning was highlighted when a deep neural network
won the ImageNet competition, significantly outperforming other methods in image
recognition .
8. 2014-Present: Deep learning has continued to evolve, with advancements in
architectures like Generative Adversarial Networks (GANs), transformers, and
applications across various fields such as natural language processing, medical
imaging, and autonomous driving .
Application of Deep Learning
1. Healthcare: Deep learning is used for disease diagnosis, medical imaging, and
personalized treatment plans. For example, it helps in detecting anomalies in X-rays
and MRIs.
2. Automotive: Self-driving cars rely heavily on deep learning to process data from
sensors and cameras to navigate and make decisions .
3. Finance: It is used for fraud detection, risk management, and algorithmic trading by
analyzing large datasets to identify patterns and anomalies.
4. Retail: Deep learning enhances customer experience through personalized
recommendations, inventory management, and demand forecasting.
5. Natural Language Processing (NLP): Applications include virtual assistants like Siri
and Alexa, language translation, and sentiment analysi
6. Entertainment: It powers recommendation systems for streaming services like
Netflix and Spotify, and is used in creating realistic animations and special effects2.
7. Cybersecurity: Deep learning helps in detecting malware, phishing attacks, and
other cybersecurity threats by analyzing network traffic and user behavior1
8. Manufacturing: It is used for predictive maintenance, quality control, and optimizing
supply chains.
9. Robotics: Deep learning enables robots to perform complex tasks such as object
recognition, path planning, and autonomous navigation.
10. Agriculture: Applications include crop monitoring, soil analysis, and predicting yields
to improve farming practices.
11 Weather Forecasting :-Colorful Clouds is using GPU computing and Al to process,
predict, and communicate weather and air-quality conditions quickly through a new
age which generating forecasting and reporting.
McCulloch Pitts Neuron :-
It is very well known that the most fundamental unit of deep neural networks is called an
artificial neuron/perceptron. But the very first step towards the perceptron we use today was
taken in 1943 by Warren McCulloch and Walter Pitts , by mimicking the functionality of a
biological neuron.
L.M.KuwarThe McCulloch-Pitts neuron is one of the earliest mathematical models of a biological neuron,
proposed by in 1943.
Biological Neurons: An Overly Simplified Illustration
Dendrites
A Biological Neuron —
Dendrite: Receives signals from other neurons
Soma: Processes the information
Axon: Transmits the output of this neuron
Synapse: Point of connection to other neurons
Basically, a neuron takes an input signal (dendrite), processes it like the CPU
(soma), passes the output through a cable like structure to other connected
neurons (axon to synapse to other neuron’s dendrite). Now, this might be
biologically inaccurate as there is a lot more going on out there but on a higher
level, this is what is going on with a neuron in our brain — takes an input,
processes it, throws out an output.
Our sense organs interact with the outer world and send the visual and sound
information to the neurons. Let's say you are watching Friends. Now the
information your brain receives is taken in by the “laugh or not” set of neurons
that will help you make a decision on whether to laugh or not. Each neuron gets
fired/activated only when its respective criteria (more on this later) is met like
shown below.
one tome .
x
4. Basic Structure:The McCulloch-Pitts neuron model is a simplified version of a biological neuron and consists of:
‘© Inputs (dendrites): The model receives multiple binary inputs, each representing a
signal.
© Weights: Each input has a corresponding weight, which determines the strength or
importance of that input.
© Summation Function: The inputs are summed together, and the result is compared
toa threshold.
* Activation (or Output): If the sum exceeds a certain threshold, the neuron “fires”
and produces an output of 1 (active), otherwise, it produces an output of 0
(inactive).
2. Mathematical Model:
The behavior of the McCulloch-Pitts neuron can be described by the following equation:
Where:
© yis the output (1 or 0),
© rare the weights associated with each input X;,
© Gis the threshold value,
© f(z) is the activation function, which is typically a step function:
© Ifthe weighted sum exceeds the threshold 8, the output y=1,
© Ifthe weighted sum is below the threshold, the output y=0.
3. Threshold and Activation:
Threshold: A predefined value that the sum of the weighted inputs is compared
against.
© Step Function: The activation function is usually a simple step function that outputs
either 0 or 1, depending on whether the summed input exceeds the threshold.
4. Properties:
© Binary Inputs and Outputs: The McCulloch-Pitts neuron works with binary values
(either 0 or 1).
* Linear Decision Boundary: The neuron makes decisions based on a linear
combination of inputs and outputs a binary value. This means it can only solve
problems that are linearly separable, like the logical operations AND, OR, and NOT.
5, Logical Operations:
McCulloch and Pitts demonstrated that this simple model could perform logical operations. For
example:
AND Operation: The neuron will only fire (output 1) if all inputs are 1
© OR Operation: The neuron will ire if at least one input is 1.
© NOT Operation: The output is the inverse of the input.The McCulloch-Pitts neural model, which was the earliest ANN model, has only two types of inputs
— Excitatory and Inhibitory. The excitatory inputs have weights of positive magnitude and the
inhibitory weights have weights of negative magnitude. The inputs of the McCulloch-Pitts neuron
could be either 0 or 1. It has a threshold function as an activation function. So, the output signal
yout is 1 if the input ysum is greater than or equal to a given threshold value, else 0. The
diagrammatic representation of the model is as follows:
Input Signals
x,
|
|
H ‘Summation Junction
McCuttoch-Pitts Model
For better understanding purpose, let me consider an example:
John carries an umbrella if it is sunny o1 1g. There are four given situations. | need to
decide when John will carry the umbrella. The situations are as follows:
© First scenario: It is not raining, nor itis sunny
Second scenario: It is not raining, but it is sunny
© Third scenario: It is raining, and it is not sunny
© Fourth scenario: It is raining as well as it is sunny
To analyse the situations using the McCulloch-Pitts neural model, | can consider the input
signals as follows:
© X1: [sit raining?
© x2: Is it sunny?
So, the value of both scenarios can be either 0 or 1. use the value of both weights X1 and X2 as 1
rai
and a threshold function as 1. So, the neural network model will look like:
Threshold
Function
L.M.KuwarTruth Table for this case will be:
Situatiion x1 x2 Ysum Yout
1 oO oO 0 0
2 oO 1 1 1
3 1 Oo 1 1
4 1 1 2 1
So, I can say that,
Youm =27jn1W%
Your=FlYsum)= 1 X>=1
0 x<1
The truth table built with respect to the problem is depicted above. From the truth table, | can
conclude that in the situations where the value of yout is 1, John needs to carry an umbrella.
Hence, he will need to carry an umbrella in scenarios 2, 3 and 4.
Perceptron:
APerceptron is an Artificial Neuron
Itis the simplest possible Neural Network
Neural Networks are the building blocks of Deep Learning
Frank Rosenblatt
Frank Rosenblatt (1928 - 1971) was an American psychologist notable in the field
of Artificial Intelligence.
In 1957 he started something really big. He "invented" a Perceptron program, on
an IBM 704 computer at Cornell Aeronautical Laboratory.
Scientists had discovered that brain cells (Neurons) receive input from our senses by
electrical signals.
The Neurons, then again, use electrical signals to store information, and to make
decisions based on previous input.
Frank had the idea that Perceptrons could simulate brain principles, with the ability to
learn and make decisions.
The Perceptron
The Perceptron is one of the simplest types of artificial neural networks and serves as the
foundation for more complex neural network architectures. It was introduced by Frank
Rosenblatt in 1958 and is primarily used for binary classification tasks.
Structure of the PerceptronHt Zwerb2o
‘Aix: gq
w AZ wxabeo)
w
‘Summation
Inputs Weights ana'Bias Activation Output
A perceptron consists of:
Inputs (;,x....%,) Features of the data.
Weights (w,,Ws,...¥,): Parameters that determine the importance of each input.
Bias (b): An offset that allows the decision boundary to shift.
Activation Function: Determines the output of the perceptron, typically a step
function in the original formulation.
aYNS
The perceptron computes the weighted sum of inputs and applies the activation function:
ot
Working of the Perceptron
«Input Data: The perceptron takes input values (x1,x2,...xn).
2. Weighted Sum: it calculates a weighted sum using the formula:
3. Activation Function: The step function outputs:
© 1ifz20
o Oifz<0
4, Output: The result is a binary classification (e.g., true/false, positive/negative).
Learning in the Perceptron
The perceptron learns by adjusting its weights and bias using a learning algorithm:
1. Start with random weights and bias.
2. For each training sample:
© Compute the perceptron’s output.
© Compare it to the true label (target output).
© Update the weights and bias using the perceptron learning rule:
wicw/tdw, where Aw=n(ytrue-ypred)x,
Here:
=n: Learning rate (a small positive constant).
= ytrue: True label.
= ypredy: Predicted label.
L.M.Kuwar3. Repeat until the weights converge or a stopping criterion is met.
Perceptron Learning Algorithm
Repeat
Wilt) =Wj +n dy»
Perceptron Learning Algorithm
To get started, Ill explain a type of artificial neuron called a perceptron,
oe e 6
os
4 Function
chematic for a neuron in a neural net
Activation FunctionApplications
itcan be used to implement Logic Gates,
> 4d
>
=
Gooo
=o=oy
ato <
=on-og
=ooo<
Applications
itcan be used to implement Logie Gates,
Xi] X2
olo
o|1
1] 0
xa] x2] ¥ 14
| o}o
of1y4
r}o}4
aya}4Applications
It can be used to implement Logic Gates.
x
x
x
&
0
0
1
1
Applications
Its used to classify any linearly separable set of inputs.
ar
ot of aft
=a
Perceptron Example
Imagine a perceptron (in your brain).
The perceptron tries to decide if you should go to a concert.
Is the artist good? Is the weather good?
What weights should these facts have?
Criteria Input Weight
Artists is Good xl=Oorl wl = 0.7
Weather is Good x2=Oor1 w2 = 0.6
Friend will Come x3 =Oorl w3 =0.5
Food is Served x4=Oorl w4 = 0.3
Water is Served xS =Oorl w5 = 0.4The Perceptron Algorithm :-
Frank Rosenblatt suggested this algorithm:
1. Set a threshold value
2. Multiply all inputs with its weights
3. Sum all the results
4. Activate the output
. Set a threshold value:
¢ Threshold = 1.5
2. Multiply all inputs with its weights:
xl * wl =1*0.7=0.7
x2 *w2=0*0.6
x3 *w3 =1*0.5
x4 * w4 *0.3
eo xS*w5=1*0.4=0.4
Lal
Sum all the results:
© 0.7+0+0.5 +0 + 0.4 = 1.6 (The Weighted Sum).
. Activate the Output:
* Return true if the sum > 1.5 ("Yes I will go to the Concert").
s
Multi-layer Perceptron :
History:- Deep Learning deals with training multi-layer artificial neural networks,
also called Deep Neural Networks. After Rosenblatt perceptron was developed in
the 1950s, there was a lack of interest in neural networks until 1986, when
Dr.Hinton and his colleagues developed the backpropagation algorithm to train a
multilayer neural network. Today it is a topic with many leading firms like Google,
Facebook, and Microsoft which invest heavily in applications using deep neural
networks.
Multi-layer perception is also known as MLP. It is fully connected dense layers,
which transform any input dimension to the desired dimension. A multi-layer
perception is a neural network that has multiple layers. To create a neural network
we combine neurons together so that the outputs of some neurons are inputs of
other neurons.
A multi-layer perceptron has one input layer and for each input, there is one
neuron(or node), it has one output layer with a single node for each output and it
can have any number of hidden layers and each hidden layer can have any
L.M.Kuwarnumber of nodes. A schematic diagram of a Multi-Layer Perceptron (MLP) is
depicted below.
Outputs
inputs
Output
Input Hidden
Laver Layer ane
Architecture of an MLP
1. Input Layer:
© This is the first layer of the network.
© Each neuron corresponds to one feature of the input data.
© No computations are performed here; it merely passes the input to the next
layer.
2. Hidden Layers:
© Located between the input and output layers.
© Each neuron in a hidden layer computes a weighted sum of its inputs, adds a
bias, and passes the result through an activation function (e.g., ReLU,
Sigmoid, Tanh)
© These layers enable the network to learn complex, non-linear relationships
in the data
3. Output Layer
© Provides the final output of the network
© The number of neurons depends on the task:
= Regression: Single neuron or multiple neurons for vector outputs.
= Classification: One neuron (binary classification) or one neuron per
class (multi-class classification).
Every node in the multi-layer perception uses a sigmoid activation function. The sigmoid
activation function takes real values as input and converts them to numbers between 0 and 1
using the sigmoid formula.
(x) = 1/(1 + exp(—2))
igmoid Neurons: In the perceptron model, the limitation is that there is a very harsh change in
output function(binary output) which require linearly separable data. However, in most real-life
cases, need a continuous output. So we propose a Sigmoid Neurons model .
L.M.Kuwarit
Y= Tyee)
Above function shown is a sigmoid function where it takes linear input resulting in
smooth continuous output(shown in red line).
Here red line is the output of the sigmoid model and the blue line output of the perceptrons
model. The output value lies between [0,1] irrespective of the number of inputs. As the sum
keeps changing, Observe different values along the red line,
The sigmoid model can be used both for regression and classification problems. in the case
of regression, the predicted of the sigmoid function is y value whether in a classification
problem, first predict using sigmoid function then decide threshold value for classes which
classify the different class of predicted y. The threshold can be 0.5 or mean of predicted y or
anything depending on the problem.
Feed Forward Process in Deep Neural Network
“Feedforward propagation, input data is passed through the network layer by
layer, with each layer performing a computation based on the inputs it receives
and passing the result to the next layer”.
"The process of receiving an input to produce some kind of output to make some
kind of prediction is known as Feed Forward".
The information flow through the layer from input to the output without feedback
loops.
They are used for classification and regression.
Architecture of Feedforward Neural Networks
L.M.KuwarThe architecture of a feedforward neural network consists of three types of layers:
the input layer, hidden layers, and the output layer. Each layer is made up of units
known as neurons, and the layers are interconnected by weights.
Input Layer: This layer consists of neurons that receive inputs and pass them on
to the next layer. The number of neurons in the input layer is determined by the
dimensions of the input data.
Hidden Layers: These layers are not exposed to the input or output and can be
considered as the computational engine of the neural network. Each hidden layer's
neurons take the weighted sum of the outputs from the previous layer, apply an
activation function, and pass the result to the next layer. The network can have
zero or more hidden layers.
Output Layer: The final layer that produces the output for the given inputs. The
number of neurons in the output layer depends on the number of possible
outputs the network is designed to produce.
Each neuron in one layer is connected to every neuron in the next layer, making
this a fully connected network. The strength of the connection between neurons is
represented by weights, and learning in a neural network involves updating these
weights based on the error of the output.
How Feedforward Neural Networks Work
The working of a feedforward neural network involves two phases: the
feedforward phase and the backpropagation phase.
Feedforward Phase: In this phase, the input data is fed into the
network, and it propagates forward through the network. At each
L.M.Kuwarhidden layer, the weighted sum of the inputs is calculated and passed
through an activation function, which introduces non-linearity into the
model. This process continues until the output layer is reached, and a
prediction is made.
Backpropagation Phase: Once a prediction is made, the error
(difference between the predicted output and the actual output) is
calculated, This error is then propagated back through the network, and
the weights are adjusted to minimize this error. The process of adjusting
weights is typically done using a gradient descent optimization
algorithm.
Activation Functions
neurons in hidden layers use activation functions to introduce non-linearity into
the model. This helps the network learn from complex data. Common activation
functions such as:
© ReLU (Rectified Linear Unit): Outputs the input directly if it is positive;
otherwise, it outputs zero.
Sigmoid: Converts the input into a value between 0 and 1, useful for
binary classification.
« Tanh: Similar to Sigmoid but outputs values between -1 and 1, often
used in tasks where the input data is centered around zero.
Training Feedforward Neural Networks
Training a feedforward neural network involves using a dataset to adjust the
weights of the connections between neurons. This is done through an iterative
process where the dataset is passed through the network multiple times, and each
time, the weights are updated to reduce the error in prediction. This process is
known as gradient descent, and it continues until the network performs
satisfactorily on the training data.
Applications of Feedforward Neural Networks
Feedforward neural networks are used in a variety of machine learning tasks
including:
Pattern recognitionClassification tasks
Regression analysis
Image recognition
Time series prediction
To gain a best understanding of the feed-forward process, let's see this mathematically.
1) The first input is fed to the network, which is represented as matrix x1, x2, and one
where one is the bias value.
By x2 1]
2) Each input is multiplied by weight with respect to the first and second model to obtain
their probability of being in the positive region in each model.
So, we will multiply our inputs by a matrix of weight using matrix multiplication.
[Im x, = [es wel = [score score]
Wor Wee.
3) After that, we will take the sigmoid of our scores and gives us the probability of the
point being in the positive region in both models.
1
ite
4) multiply the probability which have obtained from the previous step with the second
set of weights. Always include a bias of one whenever taking a combination of inputs.
Wit War
[probability probability 1] x [es wa = [score]
Wa, W32.
[score score] = probability
And as we know to obtain the probability of the point being in the positive region of this
model, we take the sigmoid and thus producing our final output in a feed-forward
process.
L.M.Kuwar= [probability]
Let takes the neural network which we had previously with the following linear models
and the hidden layer which combined to form the non-linear model in the output layer.
“sr 243
So, what will do use above non-linear model to produce an output that describes the
probability of the point being in the positive region. The point was represented by 2 and
2. Along with bias,represent the input as
f2 2 4)
The first linear model in the hidden layer recall and the equation defined it
4x, — xp +12
Which means in the first layer to obtain the linear combination the inputs are multiplied
by-4, -1 and the bias value is multiplied by twelve.
4+ Wi
I2 2 1) x [3 |
12 Wyp.
The weight of the inputs are multiplied by -1/5, 1, and the bias is multiplied by three to
obtain the linear combination of that same point in our second model.
=4 -1/5
[22 1) x J-1 =1
12203
2(-4) + 2(-1) + 103)
2(-4) + 2(-1) + 1(12).
[2 0.6]
Now, to obtain the probability of the point is in the positive region relative to both models
we apply sigmoid to both points as
4 4
lt Il
The second layer contains the weights which dictated the combination of the linear
models in the first layer to obtain the non-linear model in the second layer. The weights
are 1,5, 1, and a bias value of 0.5,
L.M.Kuwar
[3s a | 88 0.64]
te leNow, we have to multiply our probabilities from the first layer with the second set of
weights as
1.5}
1
0.5.
[0.88 0.64 1] x
| = [0.88(1.5) + (0.64)(1) + 1(0.5)] = 2.46
Now, we will take the sigmoid of our final score
4 -[0.92]
ents
1+
It is complete math behind the feed forward process where the inputs from the input
traverse the entire depth of the neural network. In this example, there is only one hidden
layer. Whether there is one hidden layer or twenty, the computational processes are the
same for all hidden layers.
Backpropagation Process in Deep Neural Network:-
Backpropagation is an algorithm used to train neural networks by iteratively adjusting the
network's weights and biases in order to minimize the loss function. A loss function (also
known as a cost function or objective function) is a measure of how well the model's
predictions match the true target values in the training data. The loss function quantifies
the difference between the predicted output of the model and the actual output,
providing a signal that guides the optimization process during training.
Input values
W1=0.15
W2=0.20
W3=0.25
W4=0.30
Bias Values
L.M.Kuwarb1=0.35 b2=0.60
Target Values
T1=0.01
T2=0.99
Now, we first calculate the values of H1 and H2 by a forward pass.
Forward Pass
To find the value of H1 we first multiply the input value from the weights as
H1=x1 xy tx2xwtb1
H1=0.05x0.15+0.10x0,20+0.35
H1=0.3775
To calculate the final result of H1, we performed the sigmoid function as
Hl final =
1
ELginal = ——q—
1+ soars
H1 finat = 0.593269992
We will calculate the value of H2 in the same way as H1
H2=x1xwetx2xwytb1
.05x0.25+0.10%0,3040.35
H2=0.3925
H2:
To calculate the final result of H1, we performed the sigmoid function as
1
H2ginal =
1
lt+or
1
H2ginal = ——q—
1
1+ ga3es
H2 i,q) = 0.596884378
Now, we calculate the values of y1 and y2 in the same way as we calculate the H1 and H2.
To find the value of y1, we first multiply the input value i.e., the outcome of H1 and H2 from the
weights as
L.M.KuwaryI=H1 xws+H2xwetb2
y1=0.593269992x0.40+0.596884378x0.45+0.60
y1=1.10590597
To calculate the final result of y1 we performed the sigmoid function as
Y1inat =
¥nnal = i
+ Ganse7
Ylfinal = 0.75136507
We will calculate the value of y2 in the same way as y1
y2=H1xw;+H2xwgtb2
y2=0.593269992x0, 50+0.596884378x0,55+0.60
y2=1,2249214
To calculate the final result of H1, we performed the sigmoid function as
1
¥2final =
1
thor
1
y2anat = t
14+ ama
tina = 0.772928465,
Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched with our target values
Ti and T2.
Now, we will find the total error, which is simply the difference between the outputs from the
target outputs. The total error is calculated as
Etotat = D2 (target — output)?
So, the total error is
L.M.Kuwar1 1
zu vifinas)? + yt y2pmat)?
1 1 :
= 3 (0.01 — 0.75136507)" + (0.99 — 0.772928465)"
= 0.274811084 + 0.0235600257
Eqotal ~ 0. 29837111
Now, we will backpropagate this error to update the weights using a backward pass.
Backward pass at the output layer
To update the weight, we calculate the error correspond to each weight with the help of a total
error. The error on weight w is calculated by differentiating total error with respect to w,
We perform backward process so first consider the last weight w5 as
-@
ta
Errorys = ae
--(@)
1 at
Exotal = 5 (T1— ¥liinal)® +5 (T2 — Y2 final)?
2 2
From equation two, it is clear that we cannot partially differentiate it with respect to w5 because
there is no any w5. We split equation one into multiple terms so that we can easily differentiate
it with respect to w5 as
ae, ar, ay1_final
coral Front Oy’
ayt
3wS Ayinnt vy
dws
~ (3)
Now, we calculate each term one by one to differentiate E,.,,, with respect to w5 as
L.M.Kuwar1 1
28g _ 203 T —¥t maa )® + 5 (12 — ynnt)®)
yl Anal Oy Lenal
1
=2x5x(T1 —Ylgnat)?“* x (-1) +0
(TL yLpnat)
= ~(0.01 — 0.75138507)
Eo
0.74136507
O¥1 final
tey = —t
¥htnd = TE eoH
1
Gm)
Oy nat
yl ayl
ov
“see
= OFX (VLinal)? oo oe
Ylfinal = 7
Lvl ena (7)
vifinal
Putting the value of e* in equation (5)
1 =vlenat
= x Cy tpinal?
© Yisinal
= ylgnai X (1 —¥1 naa)
= 0.75136507 x (1 — 0.75136507)
(8)
OyAsinat
Dat = 0186815602.
YL = Hi ging X WS + H fing) X WG +2.
yl _ AHA X WS + H2hinat X WS +b2)
- (9)
aw5 aw5
= Hlfnat
(40)
yt
dwg = 0596884378...
Brora A¥innal oy AyL
eva in equation no (2) to find the final result.
So, we put the values of °Y*8at
L.M.KuwarSytnnn . yt
dyl dws
= 0.74136507 x 0.186815602 x 0.593269992
PE erat
ows,
Now, we will calculate the updated weight W5,.,, with the help of the following formula
Errors = 0,0821670407 vo oon (11)
FEroral
aws
= 0.4— 0.5 x 00821670407
WSpow = WS — 1X Here,n = learning rate = 0.5
W5useue = 0. 35891648 wo (12)
In the same way, We calculate W6rey,W7reu ANd W8ney and this will give us the following values
W5poy=0.35891648
W6 pon =408666186
W7oy=0-511301270
WB uu=0.561370121
Backward pass at Hidden layer
Now, backpropagate to our hidden layer and update the weight w1, w2, w3, and w4 as we have
done with w5, w6, w7, and w8 weights.
calculate the error at w1 as
FE socal
Errotws = Gare
1 ail +
Erotat = 3 (T1—ylsinas)!’ + 5 (72 — ¥2 nai)”
From equation (2), it is clear that we cannot partially differentiate it with respect to w1 because
there is no any w1. We split equation (1) into multiple terms so that we can easily differentiate it
with respect to w1 as
Brora) __ABrorat_, OHILfinal
Bwi BE Ij, ani
Now, we calculate each term one by one to differentiate E,..y with respect to w1 as.
1 1
Broad _ 9G (TL yleas)? + 5 (12 ~ y2hna)
THlgna OWL
(14)
We again split this because there is no any H1"™"' term in E"°*"! as
L.M.KuwarAso AE, _OEs
Fina AL” Bn
(18)
Es
nasi will again split because in E1 and E2 there is no H1 term. Splitting is done as
OB _ 8B, ayl
PHlgna OY Oana
OE, _ OE: oye
lanai O92” OHI final
(as)
OE), OBy
28 ang 2B
We again Split both 9!" @y2_ because there is no any y1 and y2 term in E1 and E2. We split it
as
2¥Anaat
ay
aya,
-@s)
G9)
28: |g 282
Now, we find the value of 41°" 82. by putting values in equation (18) and (19) as
From equation (18)
Ey __ OE, Vana
Syl Byles © Ay
(4 (T 1 ytenat)®) Ov
oF avi
ay tpn
1
$25 (T1— vlna) XD xT
From equation (8)
1
= 2 x5 (0.01 ~ 0.75136507) x (-1) x 0.196815602
oF,
So = 0.138498562..
oy1 (20)
From equation (19)
L.M.KuwarGE, GE: vf nal
Oy2 Oy2san1 YZ
_ OG (72 — y2anat)") — By2enat
PY2hna aye:
By 2eimat
= 2X5 (12 ypu) X (1) X vosesow (21)
ay2
(22)
¥2 final
2%? X (y2enet)?
23)
2emal = ———e
¥2emal = Ta
1-2
oor? = LO ¥2hieat
(24)
¥2Qinat eo
Putting the value of e% in equation (23)
_ iy
q
X (¥2hinal)™
¥26nal
= y2tmai (1 — y2sinai)
0.772928465 x (1 — 0.772928465)
B¥2Z hist
ay2
0. 1755100838 ..........(25)
From equation (21)
=2x $10.99 —0.772928465) x (—1) x 0.175510053
SE: __o.osso9e2s66126414.
ayl
(26)
Now from equation (16) and (17)
L.M.KuwaraE, _ayt
yl lana
(HL gna) X Ws + H2 pat X Wy + D2)
OH Laat
= 0.138498562 x
A(HL final X Ws + H2pnat X Ws + b2)
= 0.138498562 x > nt
OH Lana
= 0.138498862 x w5
= 0.138498562 x 0.40
ar,
OHI einat
oe; _ 88, ave
OHI p,q) Jy2” OH1_final
0553994248 (27)
O(H Linas X Wr + H2 final X We +b2)
OH 1 gna
0.0380982366126414 x
= —0.0380982366126414 x w7
0,0380982366126414 x 0.50
—0,0190491183063207 ......... (28)
25g OB
Put the value of 941/inat_ — @Hnal_ in equation (15) as
PE otal aE,
OL peat AHL enat
aby
OH Lginat
= 0.0553994248 + (—0,0190491183063207)
OE coral
Blagg 7 0° 0364908241736795 .......(29)
SE total . @H1final dH1
We have®H4ninal’ we need to figure out 91” dwias
L.M.Kuwar1
gaa ==
final ~ Tp HT
=H final
Te (31)
Putting the value of e*” in equation (30)
Hy,
fete (HI nat)?
Hl geal
= Hl gmat X (1 — Hl fnat)
= 0,593269992 x (1 — 0.593269992)
BHLpnat
‘on1
0.2413007085923199
We calculate the partial derivative of the total net input to H1 with respect to w1 the same as we
did for the output neuron:
HL = Hing X WS + H2pngt X W6 +b2 . (32)
Oyl _ (xl x wi + x2 xw3 + bi x1)
Ow awl
=xi
oH
Bowd 7 OB ov (BB)
SEtorat OH1finat aH2
So, we put the values of @44nna’ @H4 @w2 jn equation (13) to find the final result.
FEroral FE rocal
wl OH Lge
= 0.0364908241736793 x 0.2413007085923199 X 0.05
OE rota
awl
OH gaa OF
oui” aw
Error, = 0,000438568
(34)
L.M.KuwarNow, we will calculate the updated weight W1,.. with the help of the following formula
a
Wlyew = wh — 1 x Si Here = learning rate = 0.5
= 0.15 — 0.5 x 0.000438568
Wee = 0.149780716 .......(35)
In the same way, we calculate W2yey,W3;eq and w4 and this will give us the following values
W4 poy3=0.29950229
updated all the weights. found the error 0.298371109 on the network when fed forward the 0.05
and 0.1 inputs. In the first round of Backpropagation, the total error is down to 0.291027924.
After repeating this process 10,000, the total error is down to 0.0000351085. At this point, the
outputs neurons generate 0.159121960 and 0,984065734 i.e, nearby our target value when feed
forward the 0.05 and 0.1
Convolutional Neural Networks (CNNs) :-
A Convolutional Neural Network (CNN) is a type of Deep Learning neural network
architecture commonly used in Computer Vision. Computer vision is a field of
Artificial Intelligence that enables a computer to understand and interpret the
image or visual data.
When it comes to Machine Learning, Artificial Neural Networks perform really
well.
Convolutional Neural Network (CNN) is the extended version of artificial
neural networks (ANN) which is predominantly used to extract the feature from
the grid-like matrix dataset. For example visual datasets like images or videos
where data patterns play an extensive role.
CNN Architecture :-
Convolutional Neural Network consists of multiple layers like the input layer,
Convolutional layer, Pooling layer, and fully connected layers.
L.M.KuwarConvolutional Max pooling Danse
layer layer layer
Input layer (Output layer
Mage — — — —-
A complete Convolution Neural Networks architecture is also known as covnets. A
covnets is a sequence of layers, and every layer transforms one volume to another
through a differentiable function.
Key components of a Convolutional Neural Network include:
1. Convolutional Layers: These layers apply convolutional operations to input
images, using filters (also known as kernels) to detect features such as edges,
textures, and more complex patterns. Convolutional operations help preserve the
spatial relationships between pixels.
2. Pooling Layers: Pooling layers downsample the spatial dimensions of the
input, reducing the computational complexity and the number of parameters in
the network. Max pooling is a common pooling operation, selecting the maximum
value from a group of neighboring pixels.
3. Activation Functions: Non-linear activation functions, such as Rectified
Linear Unit (ReLU), introduce non-linearity to the model, allowing it to learn more
complex relationships in the data.
4. Fully Connected Layers: These layers are responsible for making predictions
based on the high-level features learned by the previous layers. They connect
every neuron in one layer to every neuron in the next layer.
Applications of CNN?
Convolutional Neural Networks (CNNs) are a specialized type of neural network that are
specifically designed for image processing and computer vision tasks. Here are some
applications of CNNs in more detail:Image Classification: One of the most common applications of CNNs is image
classification, where the task is to assign a label or category to an input image.
CNNs can learn to recognize patterns and features in the input image and use
them to make accurate predictions.
© Object Detection: Object detection is the task of identifying and localizing
‘objects in an image. CNNs can be used to detect the presence and location of
objects in an image, which is useful in applications such as self-driving cars,
surveillance systems, and robotics.
Facial Recognition: CNNs can be used for facial re cognition, which involves
identifying and verifying the identity of a person from a digital image or video.
This application is used in security systems, law enforcement, and social
media.
© Medical Image Analysis: CNNs can be used to analyze medical images such as
X-rays, CT scans, and MRI scans. This can aid in the diagnosis of diseases and
conditions, as well as the development of personalized treatment plans.
¢ Natural Language Processing: While CNNs are primarily used in image
processing, they can also be used in natural language processing tasks such as
text classification and sentiment analysis. CNNs can be used to learn patterns
and features in text data, which can be used to classify or analyze text.
* Video Analysis: CNNs can also be used for video analysis, such as detecting
and tracking objects in video streams or recognizing actions in videos.
¢ CNNs are widely used in applications related to image processing, computer
vision, and natural language processing. They are especially useful in tasks
that involve recognizing patterns and features in complex data such as images
and videos.
There are many popular tools and frameworks for developing CNNs, including:
TensorFlow: An open-source software library for deep learning
developed by Google.
PyTorch: An open-source deep learning framework developed by
Facebook.
MXNet: An open-source deep learning framework developed by Apache
MXNet.
« Keras: A high-level deep learning API for Python that can be used with
TensorFlow, PyTorch, or MXNet.Locatio® shifted
Variation 1
93-3 0-9-3 ogy
‘Tohandle variety in digits we can use
‘Simple artificial Neural network (ANN)
9.55.3
a 0.29
°
°
° 0.07
°
papate °
a 092
°
° a
zi + 20 08.
= 5 210
24
Aieieisiaie ° 0.001
7 by arid
Image size = 1920 x 1080 x 5
First layer neurons = 1920 «1080 X 3 ~ 6 milion
Hidden layer neurons = Let's say you keep it ~ 4 million
Weights between input and hidden layer = 6 mil 4 mil
= 24 million
Disadvantages of using ANN for image classification
1, Too much computation
2. Treats local pixels same as pixels far apart
3. Sensitive to location of an object in an image
L.M.KuwarLoopy pattern
filter
Bs Diagonal line
Verti¢al line
pests filter
-14141-1-1-1-14141 = -1 > -1/9 = -0.
aS
1 | a4
aja
jeg *
a] i)]4]a] +t
4 055 0.11 -0.33
a -0.33 0.33 -0.33
a -0.22 -0.11 -0.22
a -0.33 -0.33 -0.33
1
Feature MapLoopy pattern
Filters are nothing
but the feature
detectorsL.M.KuwarL.M.KuwarIsthis
Feature Extraction Classification
engin [an Mon)
a
5 2 o °
2 : wssfas fas] 1 [ea
a[ala 1 * > | 933) 033 | 043 0 ass
Ewes ; = ,
22 on|on oa
: 1 7
433/03 | 033
L.M.KuwarPooling layer is used to
reduce the size
No me) ol
weioj}re|e
N|o|wo|w
ol/e|ni]s
0
©
2 by 2 filter with stride = 2Shifted 9 at
different position
eee | ESET i
‘ sfenfenfen| > fenpele| »
O11 085 038 o o o pe ee
O55 03 Oss 0 ° ° mi 2
8 2 9 2 4 45
1 3 0 1 2 0.75
| 2 has 2 | °
Benefits BF pooling
Reduces Reduce overfitting Model is tolerant
dimensions & as there are less towards variations,
computation parameters distortionsEye.nose, ears ete
oes
a
sh
*B> a
on
Feature Extraction } Classification
ReLU Pooling
Beet
i : Sees nee
sparsity reduces foetal
CNencits] S =
+ Introduces
eos) Bening docile
overfittin:
eiecd ern tron
Nene aece ic Foote 2 Makes the model
Neem EC) se =
pote ure ete
ean te
detection
Sees)
L.M.KuwarRotation Thickness
CNN by itself doesn’t take care of
rotation and scale
+ You need to have rotated, scaled samples in training
dataset
= If you don’t have such samples than use data
augmentation methods to generate new
rotated/scaled samples from existing training samples
Different Types of CNN Models
1. LeNet
2. AlexNet
3. ResNet4. GoogleNet
5. MobileNet
6. VGG
1.LeNet
LeNet is one of the earliest convolutional neural network (CNN) architectures, introduced by
Yann LeCun in 1989. It was primarily designed for handwritten digit recognition (e.g., the MNIST
dataset) and played a foundational role in the development of modern deep learning
techniques.
LeNet-5 Architecture
LeNet-5 is the most well-known version of LeNet. Its architecture consists of 7 layers, including
convolutional, pooling, and fully connected layers, excluding the input layer. Here's a
breakdown:
4. Input Layer:
* Accepts grayscale images of size 32x3232 \times 3232%32.
© If the input image is smaller (e.g., MNIST 282828 \times 282828), padding is
applied to resize it.
Convolutional Layer (C1):
Applies 6 filters (kernels) of size 5x5 with a stride of 1.
© Output feature maps: 6x28x28.
© Activation function: Sigmoid or Tanh (used historically).
3. Pooling Layer (52):
© Sub-sampling through average pooling with a window size of 2x2 and a stride of 2.
© Output feature maps: 6«14%14.
Convolutional Layer (C3):
© Applies 16 filters of size 5x5.
© Output feature maps: 161010.
© The filters are connected to subsets of the previous layer's feature maps,
introducing sparsity.
5. Pooling Layer (54):
Average pooling with a 2*2 window and stride of 2.
© Output feature maps: 16x5x5.
6. Fully Connected Layer (C5):
«Fully connected to all neurons in the previous layer.
© Output size: 120 neurons.
7. Fully Connected Layer (F6):
«Fully connected to the previous layer.
© Output size: 84 neurons.
8. Output Layer:Fully connected layer with 10 neurons (for 10 classes in MNIST).
* Activation: Softmax to output probabilities for classification.
Key Features of LeNet
1. Convolutional Layers:
© Extract spatial features from input images by applying learnable filters.
2. Pooling Layers:
© Reduce spatial dimensions, which helps in computational efficiency and
robustness to small translations.
3. Fully Connected Layers:
© Perform high-level reasoning for classification based on extracted features.
4, Activation Functions:
© Originally used Sigmoid or Tanh, but modern implementations often use
ReLU for better gradient flow.
5, Simple Connections:
© Filters are not fully connected to all input channels, which reduces
parameters and computational complexity.
Advantages of LeNet
1. Efficient Feature Extraction:
© Convolution and pooling layers enable efficient learning of spatial
hierarchies.
2. Reduced Parameters:
© By using shared weights in convolutional layers, LeNet significantly reduces
the number of learnable parameters.
3. Scalability
© The architecture inspired modern CNNs like AlexNet, VGG, and ResNet.
Limitations of LeNet
4. Small Input Sizes:
© Designed for small 32x3232 \times 323232 inputs, limiting its application to
larger, more complex datasets.
2. Limited Depth:
©. Shallow architecture compared to modern CNNs, which limits its ability to
capture highly complex patterns.
3. Outdated Activation Functions:
© Sigmoid and Tanh can lead to vanishing gradients, making training deeper
networks challenging.
Applications
1. Handwritten Digit Recognition:
© Initially used for recognizing digits in postal codes (e.g., MNIST dataset).
2. Document Processin,
© Applied in OCR (Optical Character Recognition) systems.2.AlexNet:-
AlexNet is a deep convolutional neural network architecture that revolutionized computer
vision by winning the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012. It was
proposed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton and demonstrated the power
of deep learning on large datasets using GPUs.
Key Contributions of AlexNet
1. Deep Convolutional Architecture:
© Introduced depth and complexity, with 8 layers (5 convolutional layers and 3
fully connected layers).
2. ReLU Activation:
© Replaced Sigmoid/Tanh with ReLU (Rectified Linear Unit), enabling faster
training.
3. Dropout Regularization:
© Used to prevent overfitting in the fully connected layers.
4. GPU Utilization:
© Leveraged GPUs for parallel processing, accelerating training on large
datasets.
5. Data Augmentation:
© Applied techniques like random cropping, flipping, and image jittering to
increase dataset size and reduce overfitting.
6. Overlapping Pooling:
© Used overlapping max-pooling instead of average pooling to improve feature
representation.
AlexNet Architecture
AlexNet processes images of size 227x273 and outputs predictions for 1000 classes. Here's the
detailed breakdown:
1. Input Layer:
© Accepts RGB images of size 227x227%3.
2. Convolutional Layer 1 (Conv1):
© Filters: 96
© Kernel size: 11%11
© Stride: 4
Activation: ReLU
© Output: 55%55*96
3. Max Pooling 1 (Pool1):
© Pool size: 3x3
© Stride: 2
© Output: 27*27%96
4, Convolutional Layer 2 (Conv2):
Filters: 256
© Kernel size: 5x5
L.M.Kuwar© Stride: 1
* Activation: ReLU
© Output: 27x27%256
5. Max Pooling 2 (Pool2):
Poo! size: 3x33 \times 33x3
© Stride: 2
© Output: 13x13%256
6. Convolutional Layer 3 (Conv3):
© Filters: 384
© Kernel size: 3x3
© Stride: 1
Activation: ReLU
© Output: 13*13%384
7. Convolutional Layer 4 (Conv4):
* Filters: 384
© Kernel size: 3x3
© Stride: 1
© Activation: ReLU
© Output: 13x13%384
8. Convolutional Layer 5 (Conv5):
© Filters: 256
© Kernel size: 3x3
© Stride: 1
Activation: ReLU
© Output: 13*13*256
9. Max Pooling 3 (Pool3):
© Pool size: 3x3
© Stride: 2
* Output: 6x6*256
10. Fully Connected Layer 1 (FC6):
Neurons: 4096
© Activation: ReLU
Dropout applied.
11. Fully Connected Layer 2 (FC7):
Neurons: 4096
© Activation: ReLU
Dropout applied.
12. Fully Connected Layer 3 (FCB):
* Neurons: 1000 (number of classes in ImageNet)
Activation: Softmax.
Key Innovations
1. Local Response Normalization (LRN):
© Applied after ReLU activations to enhance generalization.2. Overlapping Pooling:
© Reduced dimensions while maintaining more spatial information.
3. Dropout:
© Regularized the model by randomly setting a fraction of neurons to zero
during training.
4, GPU Training:
© Trained using two GPUs, with layers split across them.
Advantages of AlexNet
1. Breakthrough Accuracy:
© Achieved a top-5 error rate of 15.3% on ImageNet, significantly better than
previous models.
2. Scalable to Large Datasets:
© Handled millions of images effectively.
3. Inspired Modern Architectures:
© Paved the way for deeper networks like VGG, ResNet, and Inception.
Limitations of AlexNet
1. Computationally intensive:
© Requires significant hardware resources.
2. Fixed Input Size:
© Only accepts 227*227 inputs.
3. Manual Design:
© Lacks the automation seen in later architectures like NASNet.
3. Resnet (Residual Networks): Revolutionizing Deep Learning
ResNet, introduced in 2015 by Kaiming He et al., is a groundbreaking deep convolutional neural
network architecture that addressed the vanishing gradient problem and enabled the training
of extremely deep networks. It won the ILSVRC 2015 competition, achieving remarkable
accuracy on the ImageNet dataset.
Key Idea: Residual Learning
The core innovation of ResNet is the introduction of residual connections, which create shortcut
paths to allow gradients to flow directly through layers. This addresses the degradation
problem, where adding more layers leads to increased training error due to difficulties in
optimizing very deep models.
Residual Block:
Arresidual block is the building block of ResNet. Instead of learning a full mapping HO)H(x)H(x),
it learns a residual mapping F(x), where:
H0d=FOo+xHere:
© F(x): Residual function (output of convolutional layers)
x: Input (shortcut connection).ResNet Architecture Variants
ResNet comes in several versions based on depth:
1, ResNet-18: 18 layers
2, ResNet-34: 34 layers
3. ResNet-50: 50 layers
4, ResNet-101: 101 layers
5, ResNet-152: 152 layers
Structure of ResNet
1. Convolutional Layer:
© 7x7 convolution with stride 2.
© Followed by a 3x3 max-pooling layer.
2. Residual Blocks:
* Each block consists of stacked convolutional layers with shortcut connections.
3. Fully Connected Layer:
© After global average pooling, a fully connected layer outputs class probabilities.
ResNet-50 Detailed Architecture
ResNet-50 is a deeper version that uses bottleneck blocks to reduce computational complexity.
Each bottleneck block has three convolutional layers:
1. 11: Reduces dimensions.
2. 3x3: Extracts features.
3. 11: Restores dimensions.
Layer Configuration:
LayerName | Output Size | Layers
Convt 1126112 7*7,64,stride 2
MaxPool 56x56 3x3,stride 2
Conv2_x 56x56 11, 64, 3x3 11,256
Conv3_x 28x28 11,128, 33,128, 1%1,51,1%1,512
Conva x 14x14 1%1,256,3*3,256,1%1,1024
Conv5_x 1 1%1,512,3*3,512,1%1,2048
Global Avg Pooling | 11 -
FC 1000
Fully Connected
Advantages of ResNet
1. Mitigates Vanishing Gradients:
© Shortcut connections ensure gradients flow through the network without
diminishing.
L.M.Kuwar2, Supports Very Deep Networks:
© Enables training of networks with 100+ layers.
3, Improved Accuracy:
© Achieves state-of-the-art performance on image classification tasks.
4, Modularity:
© Residual blocks can be stacked to create deeper architectures.
5. MobileNet
MobileNet, introduced by Andrew G. Howard et al. in 2017, is a convolutional neural network
designed for mobile and embedded vision applications. It focuses on efficiency, enabling
deployment on devices with limited computational power.
Key Innovations in MobileNet
4. Depthwise Separable Convolutions:
© Replaces standard convolution with two steps:
= Depthwise Convolution: Applies a single filter to each input channel.
= Pointwise Convolution: Applies a 1x1 convolution to combine features
across channels.
© Reduces computation by a factor of 1/N+1/D? where N is the number of
output channels and Dis the kernel size.
Width Multiplier (a\alphaa):
© Scales the number of channels in each layer to trade off accuracy for
efficiency.
o ae (0,1].
3, Resolution Multiplier (p\rhop):
© Scales the input image resolution to reduce computational cost.
o pe 0,1].
MobileNet Architecture
MobileNet processes images of size 224x224224 \times 224224x224 by default. Here's the
structure:
Type Output size Filters | Kernel Size | Stride
Input 224x224 - - -
Conv + BatchNorm + ReLU6 an2x112, 32 | 3x3 2
Depthwise Conv + BN + ReLUG 112112, 32. | axa 1
Pointwise Conv + BN + ReLU6 112x112 64 1x1 1
Depthwise Conv + BN + ReLUS 56x56 64 | 33 2
Pointwise Conv + BN + ReLU6 56x56, 128 1x1 1Depthwise + Pointwise Conv|- - - 7
(Repeat)
Avg Pooling ore] - - -
Fully Connected 11 1000 | - -
Advantages of MobileNet
1. Efficiency:
© Lower computational cost due to depthwise separable convolutions,
2. Flexibility:
© Adjustable width and resolution multipliers allow customization for specific
hardware constraints.
3. Performance:
© Competitive accuracy on benchmarks like ImageNet with significantly fewer
parameters.
Variants of MobileNet
1, MobileNetV1:
© Initial version with depthwise separable convolutions.
2. MobileNetv2:
© Adds inverted residuals with linear bottlenecks for better performance.
3. MobileNetv3:
© Combines techniques like squeeze-and-excitation (SE) blocks and NAS
(Neural Architecture Search) for further optimization.
6. VGG
The VGG network, introduced by Karen Simonyan and Andrew Zisserman in 2014, is a deep
convolutional neural network architecture known for its simplicity and effectiveness. It was a
top performer in the ILSVRC 2014 competition, achieving high accuracy on the ImageNet
dataset.
Key Features of VGG
1. Simplicity:
© Uses only 3x3 convolutional layers stacked in increasing depth.
© Avoids complex designs like inception modules or skip connections.
2. Deep Architecture:
© Depth varies across VGG variants (e,
layers.
3. Uniform Design:
© Convolutional layers use the same kernel size, stride, and padding
throughout the network.
4, Fully Connected Layers:
VGG-11, VGG-16, VGG-19) with up to 19© Concludes with one or more fully connected layers, followed by a softmax
layer for classification.
VGG Architecture
VGG processes images of size 224x224 and increases the number of filters as the depth
increases.
VGG-16 Detailed Architecture:
Layer Type Output Size Configuration
Input 224x224 -
Conv Block 1 224x224 64: 3x3, ReLU (x2)
Max Pooling 112112 2x2, stride 2
Conv Block 2 112x112, 128: 3x3, ReLU (x2)
Max Pooling 56x56 2x2, stride 2
Conv Block 3 56x56 256: 3x3, ReLU (x3)
Max Pooling 28x28 2x2, stride 2
Conv Block 4 28x28 512 :3x3, ReLU (x3)
Max Pooling 14x14 2x2, stride 2
Conv Block 5 14x14 512: 3x3, ReLU (x3)
Max Pooling 77 2x2, stride 2
Fully Connected | 4096 ReLU
1
Fully Connected | 4096 ReLU
2
Fully Connected | 1000 Softmax
3
Variants of VGG
1. VGG-11:
© 11 layers (8 convolutional + 3 fully connected).
2. VGG-16:
© 16 layers (13 convolutional + 3 fully connected).
0 Most widely used variant.
3. VGG-19:
© 19 layers (16 convolutional + 3 fully connected).Recurrent Neural Network (RNN) :-
A recurrent neural network or RNN is a deep neural network trained on sequential
or time series data to create a machine learning (ML) model that can make
sequential predictions or conclusions based on sequential inputs.
RNNs can also be used to solve ordinal or temporal problems such as language
translation, natural language processing (NLP), sentiment analysis, speech
recognition and image captioning.
How Recurrent Neural Networks Work :- Recurrent Neural Network(RNN) is a
type of Neural Network where the output from the previous step is fed as input to
the current step. In traditional neural networks, all the inputs and outputs are
independent of each other. Still, in cases when it is required to predict the next
word of a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved this
issue with the help of a Hidden Layer. The main and most important feature of
RNN is its Hidden state, which remembers some information about a sequence.
The state is also referred to as Memory State since it remembers the previous input
to the network. It uses the same parameters for each input as it performs the same
task on all the inputs or hidden layers to produce the output. This reduces the
complexity of parameters, unlike other neural networks.
Recurrent Neural Network
4. Input Layer:
* Takes in the sequence data at each time step. This could be a series of
words in a sentence or values in a time series.
2. Hidden Layer(s):
L.M.Kuwar* Contains neurons with recurrent connections, allowing information
from previous time steps to influence the current state.
© The hidden state is updated using the current input and the previous
hidden state: ht=f(Ux,+Wh,.,+b) where U, W, and b are the weights and
bias, x, is the input at time t, andh,., is the previous hidden state.
3. Output Layer:
* Produces the final output for each time step, which could be used
immediately or fed into the next stage of processing.
© The output is typically calculated as: ot=g(Vh,+c) where V and c are the
weights and bias for the output layer.
Recurrent Nature
The key feature of RNNs is the recurrence within the hidden layers.
Information is not only passed from the input to the output layer but also
looped back within the network, allowing it to maintain a form of memory.
Variants
4. LSTM (Long Short-Term Memory):
* Contains special units called memory cells that can maintain
information over long periods. Includes gates (input, forget, and
output gates) to control the flow of information.
2. GRU (Gated Recurrent Unit):
© Asimpler alternative to LSTM that combines the forget and input gates
into a single update gate.
How RNN differs from Feedforward Neural Network? Artificial neural networks
that do not have looping nodes are called feed forward neural networks. Because
all information is only passed forward, this kind of neural network is also referred
to as a multi-layer neural network,
Information moves from the input layer to the output layer - if any hidden layers
are present - unidirectionally in a feedforward neural network. These networks are
appropriate for image classification tasks, for example, where input and output are
independent. Nevertheless, their inability to retain previous inputs automatically
renders them less useful for sequential data analysis.
L.M.Kuwarag &
(a) Recurrent Neural Network —_(b) Feed-Forward Neural Network
Recurrent Vs Feedforward networks
Recurrent Neuron and RNN Unfolding
The fundamental processing unit in a Recurrent Neural Network (RNN) is a
Recurrent Unit, which is not explicitly called a “Recurrent Neuron.” This unit has
the unique ability to maintain a hidden state, allowing the network to capture
sequential dependencies by remembering previous inputs while processing. Long
Short-Term Memory (LSTM) and versions improve the
RNN’s ability to handle long-term dependencies.
Recurrent Neuron
Unfold | w | w | w
L.M.KuwarNN Unfolding
Types Of RNN
There are four types of RNNs based on the number of inputs and outputs in the
network.
‘One to One
One to Many
Many to One
Many to Many
ere
One to One
This type of RNN behaves the same as any simple Neural network it is also known
as Vanilla Neural Network. In this Neural network, there is only one input and one
output.
‘one to one
Single Output
Single input
One to One RNN
One To Many
In this type of RNN, there is one input and many outputs associated with it. One of
the most used examples of this network is Image captioning where given an image
predict a sentence having Multiple words.
=
HH
One to Many RNN
Multiple outputs
Single Input
L.M.KuwarMany to One
In this type of network, Many inputs are fed to the network at several states of the
network generating only one output. This type of network is used in the problems
like sentimental analysis. Where give multiple words as input and predict only the
sentiment of the sentence as output.
anyone
I sigh out
Many to One RNN
Many to Many
In this type of neural network, there are multiple inputs and multiple outputs
corresponding to a problem. One Example of this Problem will be language
translation. In language translation, provide multiple words from one language as
input and predict multiple words from the second language as output.
Many to Many
Multiple inputs i iu i
Many to Many RNN
Multiple Cutputs
Key Differences Between CNN and RNN
* CNN is applicable for sparse data like images. RNN is applicable for time
series and sequential data.
© While training the model, CNN uses a simple backpropagation and RNN uses
backpropagation through time to calculate the loss.
* RNN can have no restriction in length of inputs and outputs, but CNN has
finite inputs and finite outputs.
¢ CNN has a feedforward network and RNN works on loops to handle
sequential data.
L.M.KuwarCNN can also be used for video and image processing. RNN is primarily used
for speech and text analysis.
L.M.Kuwar