CCS355 - NEURAL NETWORK AND DEEP LEARNING
UNIT - I
Processing of ANN depends upon the following three building blocks
Network Topology
Adjustments of Weights or Learning
Activation Functions
In this chapter, we will discuss in detail about these three building blocks of ANN
Network Topology
A network topology is the arrangement of a network along with its nodes and
connecting lines. According to the topology, ANN can be classified as the following
kinds −
Feedforward Network
It is a non-recurrent network having processing units/nodes in layers and
all the nodes in a layer are connected with the nodes of the previous layers. The
connection has different weights upon them. There is no feedback loop means the
signal can only flow in one direction, from input to output. It may be divided into
the following two types −
Single layer feedforward network − The concept is of feedforward ANN
having only one weighted layer. In other words, we can say the input layer is
fully connected to the output layer.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 1 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Multilayer feedforward network − The concept is of feedforward ANN
having more than one weighted layer. As this network has one or more
layers between the input and the output layer, it is called hidden layers.
Feedback Network
As the name suggests, a feedback network has feedback paths, which means
the signal can flow in both directions using loops. This makes it a non-linear
dynamic system, which changes continuously until it reaches a state of
equilibrium. It may be divided into the following types −
Recurrent networks − They are feedback networks with closed loops.
Following are the two types of recurrent networks.
Fully recurrent network − It is the simplest neural network architecture
because all nodes are connected to all other nodes and each node works as
both input and output.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 2 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Jordan network − It is a closed loop network in which the output will go to
the input again as feedback as shown in the following diagram.
Adjustments of Weights or Learning
Learning, in artificial neural network, is the method of modifying the weights
of connections between the neurons of a specified network. Learning in ANN can
be classified into three categories namely supervised learning, unsupervised
learning, and reinforcement learning.
Supervised Learning
As the name suggests, this type of learning is done under the supervision of
a teacher. This learning process is dependent.
During the training of ANN under supervised learning, the input vector is
presented to the network, which will give an output vector. This output vector is
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 3 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
compared with the desired output vector. An error signal is generated, if there is a
difference between the actual output and the desired output vector. On the basis
of this error signal, the weights are adjusted until the actual output is matched
with the desired output.
Unsupervised Learning
As the name suggests, this type of learning is done without the supervision
of a teacher. This learning process is independent.
During the training of ANN under unsupervised learning, the input vectors
of similar type are combined to form clusters. When a new input pattern is applied,
then the neural network gives an output response indicating the class to which the
input pattern belongs.
There is no
feedback from
the environment
as to what should
be the desired output
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 4 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
and if it is correct or incorrect. Hence, in this type of learning, the network itself
must discover the patterns and features from the input data, and the relation for
the input data over the output.
Reinforcement Learning
As the name suggests, this type of learning is used to reinforce or strengthen
the network over some critic information. This learning process is similar to
supervised learning, however we might have very less information.
During the training of network under reinforcement learning, the network
receives some feedback from the environment. This makes it somewhat similar to
supervised learning. However, the feedback obtained here is evaluative not
instructive, which means there is no teacher as in supervised learning. After
receiving the feedback, the network performs adjustments of the weights to get
better critic information in future.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 5 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Activation Functions
It may be defined as the extra force or effort applied over the input to obtain an
exact output. In ANN, we can also apply activation functions over the input to get
the exact output. Followings are some activation functions of interest −
Linear Activation Function
It is also called the identity function as it performs no input editing. It can be
defined as −
F(x)= x
Sigmoid Activation Function
It is of two type as follows −
Binary sigmoidal function − This activation function performs input editing
between 0 and 1. It is positive in nature. It is always bounded, which means
its output cannot be less than 0 and more than 1. It is also strictly
increasing in nature, which means more the input higher would be the
output. It can be defined as
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 6 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Bipolar sigmoidal function − This activation function performs input
editing between -1 and 1. It can be positive or negative in nature. It is always
bounded, which means its output cannot be less than -1 and more than 1. It
is also strictly increasing in nature like sigmoid function. It can be defined
as
Backpropagation is a popular method for training artificial neural networks,
especially deep neural networks.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 7 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Backpropagation is needed to calculate the gradient, which we need to adapt
the weights of the weight matrices. The weights of the neurons (ie nodes) of the neural
network are adjusted by calculating the gradient of the loss function. For this purpose
a gradient descent optimization algorithm is used. It is also called backward
propagation of errors.
A metaphor might help : picture yourself being put in a mountain, not
necessarily at the top, by a helicopter at night and/or under fog. Let’s also imagine
that this mountain is on an island and you want to reach sea level.
You have to go down, but you hardly see anything, maybe just a few meters.
Your task is to find your way down, but you cannot see the path. You can use
the method of gradient descent. This means that you are examining the
steepness at your current position. You will proceed in the direction with the
steepest descent.
You take only a few steps and then you stop again to reorientate yourself. This
means you are applying again the previously described procedure, i.e. you are
looking for the steepest descend.
Keeping going like this will enable you to arrive at a position where there is no
further descend (ie each direction goes upwards). You may have reached the
deepest level (global minimum), but you could be stuck in a basin or something. If
you start at the position on the right side of our image, everything works out fine,
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 8 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
but from the left-side, you will be stuck in a local minimum. In summary, if you
are dropped many times at random places on this theoretical island, you will find
ways downwards to sea level. This is what we actually do when we train a neural
network.
The actual backpropagation procedure
Assuming we start with a simple (linear) neural network:
with the following example value associated with weights:
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 9 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
We have labels, i.e. target or desired values t for each output value o.
The error is the difference between the target and the actual output:
We will later use a squared error function, because it has better characteristics
for the algorithm.
We will have a look at the output value o1, which is depending on the
values w11, w21, w31 and w41. Let’s assume the calculated value (o1) is 0.92 and
the desired value (t1) is 1. In this case the error is
This means in our example:
The total error in our weight matrix between the hidden and the output layer looks
like this:
The denominator in the left matrix is always the same (scaling factor).
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 10 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
We can drop it so that the
This example has demonstrated backpropagation for a basic scenario of a linear
neural network.
Flowchart of backpropagation neural network algorithm.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 11 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
The flowchart of Error Back Propagation Artificial Neural Network training
architecture
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 12 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 13 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
What is Artificial Neural Network?
Artificial Neural Network ANN is an efficient computing system whose
central theme is borrowed from the analogy of biological neural networks. ANNs
are also named as “artificial neural systems,” or “parallel distributed processing
systems,” or “connectionist systems.” ANN acquires a large collection of units that
are interconnected in some pattern to allow communication between the units.
These units, also referred to as nodes or neurons, are simple processors which
operate in parallel.
Every neuron is connected with other neuron through a connection link.
Each connection link is associated with a weight that has information about the
input signal. This is the most useful information for neurons to solve a particular
problem because the weight usually excites or inhibits the signal that is being
communicated. Each neuron has an internal state, which is called an activation
signal. Output signals, which are produced after combining the input signals and
activation rule, may be sent to other units.
A Brief History of ANN
The history of ANN can be divided into the following three eras −
ANN during 1940s to 1960s
Some key developments of this era are as follows −
1943 − It has been assumed that the concept of neural network started with
the work of physiologist, Warren McCulloch, and mathematician, Walter
Pitts, when in 1943 they modeled a simple neural network using electrical
circuits in order to describe how neurons in the brain might work.
1949 − Donald Hebb’s book, The Organization of Behavior, put forth the fact
that repeated activation of one neuron by another increases its strength
each time they are used.
1956 − An associative memory network was introduced by Taylor.
1958 − A learning method for McCulloch and Pitts neuron model named
Perceptron was invented by Rosenblatt.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 14 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
1960 − Bernard Widrow and Marcian Hoff developed models called
"ADALINE" and “MADALINE.”
ANN during 1960s to 1980s
Some key developments of this era are as follows −
1961 − Rosenblatt made an unsuccessful attempt but proposed the
“backpropagation” scheme for multilayer networks.
1964 − Taylor constructed a winner-take-all circuit with inhibitions among
output units.
1969 − Multilayer perceptron MLP was invented by Minsky and Papert.
1971 − Kohonen developed Associative memories.
1976 − Stephen Grossberg and Gail Carpenter developed Adaptive
resonance theory.
ANN from 1980s till Present
Some key developments of this era are as follows −
1982 − The major development was Hopfield’s Energy approach.
1985 − Boltzmann machine was developed by Ackley, Hinton, and
Sejnowski.
1986 − Rumelhart, Hinton, and Williams introduced Generalised Delta Rule.
1988 − Kosko developed Binary Associative Memory BAM and also gave the
concept of Fuzzy Logic in ANN.
The historical review shows that significant progress has been made in this field.
Neural network based chips are emerging and applications to complex problems
are being developed. Surely, today is a period of transition for neural network
technology.
Biological Neuron
A nerve cell neuron is a special biological cell that processes information.
According to an estimation, there are huge number of neurons, approximately
1011 with numerous interconnections, approximately 1015.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 15 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Schematic Diagram
Working
As shown in the above diagram, a typical neuron consists of the following four
parts with the help of which we can explain its working −
Dendrites − They are tree-like branches, responsible for receiving the
information from other neurons it is connected to. In other sense, we can
say that they are like the ears of neuron.
Soma − It is the cell body of the neuron and is responsible for processing of
information, they have received from dendrites.
Axon − It is just like a cable through which neurons send the information.
Synapses − It is the connection between the axon and other neuron
dendrites.
ANN versus BNN
Before taking a look at the differences between Artificial Neural
Network ANN and Biological Neural Network BNN, let us take a look at the
similarities based on the terminology between these two.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 16 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Biological Neural Network BNN Artificial Neural Network ANN
Soma Node
Dendrites Input
Synapse Weights or Interconnections
Axon Output
The following table shows the comparison between ANN and BNN based on some
criteria mentioned.
Criteria BNN ANN
Massively parallel, slow but
Processing Massively parallel, fast but inferior than BNN
superior than ANN
1011 neurons and 102 to 104 nodes mainly depends on the type of ap
Size
1015 interconnections And network design erℎ
Very precise, structured and formatted data is required
Learning They can tolerate ambiguity
to tolerate ambiguity
Fault Performance degrades with even It is capable of robust performance, hence has
tolerance partial damage the potential to be fault tolerant
Storage Stores the information in the
Stores the information in continuous memory loca
capacity synapse
Model of Artificial Neural Network
The following diagram represents the general model of ANN followed by its
processing.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 17 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Supervised Learning
As the name suggests, supervised learning takes place under the
supervision of a teacher. This learning process is dependent. During the training of
ANN under supervised learning, the input vector is presented to the network,
which will produce an output vector. This output vector is compared with the
desired/target output vector. An error signal is generated if there is a difference
between the actual output and the desired/target output vector. On the basis of
this error signal, the weights would be adjusted until the actual output is matched
with the desired output.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 18 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Perceptron
Developed by Frank Rosenblatt by using McCulloch and Pitts model, perceptron is
the basic operational unit of artificial neural networks. It employs supervised
learning rule and is able to classify the data into two classes.
Operational characteristics of the perceptron: It consists of a single neuron
with an arbitrary number of inputs along with adjustable weights, but the output
of the neuron is 1 or 0 depending upon the threshold. It also consists of a bias
whose weight is always 1. Following figure gives a schematic representation of the
perceptron.
Perceptron thus has the following three basic elements −
Links − It would have a set of connection links, which carries a weight
including a bias always having weight 1.
Adder − It adds the input after they are multiplied with their respective
weights.
Activation function − It limits the output of neuron. The most basic
activation function is a Heaviside step function that has two possible
outputs. This function returns 1, if the input is positive, and 0 for any
negative input.
Training Algorithm
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 19 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Perceptron network can be trained for single output unit as well as multiple output
units.
Training Algorithm for Single Output Unit
Step 1 − Initialize the following to start the training −
Weights
Bias
Learning rate α
For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 20 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Case 2 − if y = t then,
wi(new)=wi(old)
b(new)=b(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which would happen when there is no
change in weight.
Training Algorithm for Multiple Output Units
The following diagram is the architecture of perceptron for multiple output classes.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 21 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Step 1−
Initialize the following to start the training −
Weights
Bias
Learning rate α�
For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every training vector x.
Step 4 − Activate each input unit as follows −
Case 1 − if yj ≠ tj then,
wij(new)=wij(old)+αtjxi
bj(new)=bj(old)+αtj
Case 2 − if yj = tj then,
wij(new)=wij(old)
bj(new)=bj(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 22 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Step 8 − Test for the stopping condition, which will happen when there is no
change in weight.
Adaptive Linear Neuron (Adaline)
Case 1 − if yj ≠ tj then,
wij(new)=wij(old)+αtjxi
bj(new)=bj(old)+αtj
Case 2 − if yj = tj then,
wij(new)=wij(old)
bj(new)=bj(old)
Here ‘y’ is the actual output and ‘t’ is the desired/target output.
Step 8 − Test for the stopping condition, which will happen when there is no
change in weight.
Adaptive Linear Neuron (Adaline)
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 23 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Adaline which stands for Adaptive Linear Neuron, is a network having a
single linear unit. It was developed by Widrow and Hoff in 1960. Some important
points about Adaline are as follows −
It uses bipolar activation function.
It uses delta rule for training to minimize the Mean-Squared Error (MSE)
between the actual output and the desired/target output.
The weights and the bias are adjustable.
Architecture
The basic structure of Adaline is similar to perceptron having an extra
feedback loop with the help of which the actual output is compared with the
desired/target output. After comparison on the basis of training algorithm, the
weights and bias will be updated.
Training Algorithm
Step 1 − Initialize the following to start the training −
Weights
Bias
Learning rate α�
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 24 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-6 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −
(t−yin) is the computed error.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 25 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Step 8 − Test for the stopping condition, which will happen when there is no
change in weight or the highest weight change occurred during training is smaller
than the specified tolerance.
Multiple Adaptive Linear Neuron (Madaline)
Madaline which stands for Multiple Adaptive Linear Neuron, is a network which
consists of many Adalines in parallel. It will have a single output unit. Some
important points about Madaline are as follows −
It is just like a multilayer perceptron, where Adaline will act as a hidden unit
between the input and the Madaline layer.
The weights and the bias between the input and Adaline layers, as in we see
in the Adaline architecture, are adjustable.
The Adaline and Madaline layers have fixed weights and bias of 1.
Training can be done with the help of Delta rule.
Architecture
The architecture of Madaline consists of “n” neurons of the input
layer, “m” neurons of the Adaline layer, and 1 neuron of the Madaline layer. The
Adaline layer can be considered as the hidden layer as it is between the input layer
and the output layer, i.e. the Madaline layer.
Training Algorithm
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 26 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
By now we know that only the weights and bias between the input and the Adaline
layer are to be adjusted, and the weights and bias between the Adaline and the
Madaline layer are fixed.
Step 1 − Initialize the following to start the training −
Weights
Bias
Learning rate α�
For easy calculation and simplicity, weights and bias must be set equal to 0 and
the learning rate must be set equal to 1.
Step 2 − Continue step 3-8 when the stopping condition is not true.
Step 3 − Continue step 4-7 for every bipolar training pair s:t.
Step 4 − Activate each input unit as follows −
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 27 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 28 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Back Propagation Neural Networks
Back Propagation Neural (BPN) is a multilayer neural network consisting of
the input layer, at least one hidden layer and output layer. As its name suggests,
back propagating will take place in this network. The error which is calculated at
the output layer, by comparing the target output and the actual output, will be
propagated back towards the input layer.
Architecture
As shown in the diagram, the architecture of BPN has three interconnected
layers having weights on them. The hidden layer as well as the output layer also
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 29 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
has bias, whose weight is always 1, on them. As is clear from the diagram, the
working of BPN is in two phases. One phase sends the signal from the input layer
to the output layer, and the other phase back propagates the error from the output
layer to the input layer.
Training Algorithm
For training, BPN will use binary sigmoid activation function. The training of BPN
will have the following three phases.
Phase 1 − Feed Forward Phase
Phase 2 − Back Propagation of error
Phase 3 − Updating of weights
All these steps will be concluded in the algorithm as follows
Step 1 − Initialize the following to start the training −
Weights
Learning rate α
For easy calculation and simplicity, take some small random values.
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 30 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Step 2 − Continue step 3-11 when the stopping condition is not true.
Step 3 − Continue step 4-10 for every training pair.
Phase 1
Step 4 − Each input unit receives input signal xi and sends it to the hidden unit
for all i = 1 to n
Step 5 − Calculate the net input at the hidden unit using the following relation −
PHASE-2
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 31 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 32 of 33
CCS355 - NEURAL NETWORK AND DEEP LEARNING
Prepared by: Dr.B.GOPINATH, PROF/ECE Page 33 of 33