KEMBAR78
SC Cat1 Merged PDF | PDF | Artificial Neural Network | Cybernetics
0% found this document useful (0 votes)
229 views244 pages

SC Cat1 Merged PDF

The document provides information about the objectives, outcomes and topics covered in a course on soft computing. The course aims to teach students about neural networks, fuzzy logic, and genetic algorithms. It will cover concepts like artificial neural networks, supervised and unsupervised learning, fuzzy sets, fuzzy logic, fuzzy inference systems, and genetic algorithms. Recommended textbooks and reference books are also listed. Key topics to be covered include neural network models and architectures, fuzzy sets and fuzzy rules, and genetic algorithm concepts. Soft computing techniques can be applied to problems in various domains like data clustering, pattern recognition, and machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
229 views244 pages

SC Cat1 Merged PDF

The document provides information about the objectives, outcomes and topics covered in a course on soft computing. The course aims to teach students about neural networks, fuzzy logic, and genetic algorithms. It will cover concepts like artificial neural networks, supervised and unsupervised learning, fuzzy sets, fuzzy logic, fuzzy inference systems, and genetic algorithms. Recommended textbooks and reference books are also listed. Key topics to be covered include neural network models and architectures, fuzzy sets and fuzzy rules, and genetic algorithm concepts. Soft computing techniques can be applied to problems in various domains like data clustering, pattern recognition, and machine learning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 244

SOFT COMPUTING

Course Objectives
• To understand the features of neural network
and its applications
• To learn about the concept of fuzzy logic
components
• To expose the ideas about genetic algorithm
Course Outcomes
• Ability to understand the basics of artificial
neural network and supervised learning
network
• Applying knowledge and understanding of
associative memory networks
• Applying knowledge and understanding of
unsupervised learning network
• Comprehend the fuzzy sets and the concept of
fuzziness involved in various systems
Course Outcomes
• Understand the concepts of fuzzy logic,
knowledge representation using fuzzy rules,
approximate reasoning
• Understand fuzzy concepts and develop a
Fuzzy inference system to derive decisions
• Ability to understand the concepts of genetic
Algorithm
• Apply soft computing techniques for real life
applications
RECOMMENDED BOOKS
Text Book
• S.N. Sivanandam & S.N.Deepa, “Principles of Soft Computing”, 2nd
Edition, Wiley India, 2011.
Reference Book
• Samir Roy and Udit Chakraborty, Introduction to Soft Computing,
Pearson.2013.
• Laurene Fausett, Fundamentals of Neural networks: architectures,
algorithms and applications , Pearson India, 2008
• Ross Timothy J, Fuzzy Logic with Engineering Applications, Wiley India Pvt
Ltd, New Delhi, 2010.
Module-I
Topics
• Introduction to Soft computing
• Neural networks- Introduction, evolution, basic
models, terminologies of ANN,
• Pitts model
• Perceptron
• Adaline
• Back-propagation network
• RBF network
Soft Computing
• Soft computing exploits the tolerance for imprecision,
uncertainty, and partial truth to achieve tractability,
robustness, low solution-cost, and better relationship with
reality
Soft Computing Main Components:
• Approximate Reasoning
• Search & Optimization
Neural Networks, Fuzzy Logic, Evolutionary Algorithms
Hard computing
• Conventional computing
• It requires a precisely stated analytical model
and often a lot of computation time
• Binary logic, crisp systems, numerical analysis
HARD COMPUTING SOFT COMPUTING

Conventional computing that are Non conventional approach that are


deterministic and has sharp boundary stochastic and has vague boundary

Precise, certain and has two valued Imprecise, uncertain and has multi
(Boolean) logic valued logic

Needs exact input Can handle ambiguous and noisy data

Not tractable Tractable solution

High computational cost Low computational cost

low Intelligence Quotient (MIQ) High Machine Intelligence Quotient (MIQ)

Precise Approximate reasoning


PROBLEM SOLVING TECHNIQUES

HARD COMPUTING SOFT COMPUTING

Precise Models Approximate Models

Traditional Functional
Symbolic Numerical Approximate Approximation
Logic Modeling and Reasoning and Randomized
Reasoning Search Search
SOME APPLICATION AREAS OF
SOFT COMPUTING
• Data clustering
• Rule generation
• Image processing
• Medical diagnosis
• Pattern recognition
• Social networks
• Distributed computing
• Parallel processing
• Machine learning and
• Granular computing
OVERVIEW OF TECHNIQUES IN SOFT
COMPUTING
• Neural Networks

• Fuzzy Logic

• Genetic Algorithm

• Hybrid Systems

12
Neural Networks
• Neural network was inspired by the design and
functioning of human brain and components.
• Information processing model that is inspired by the
way biological nervous system (i.e) the brain, process
information.
• ANN is composed of large number of highly
interconnected processing elements(neurons)
working in unison to solve problems.
• It is configured for special application such as pattern
recognition and data classification through a learning
process.
• 85-90% accurate.
13
Advantages of Neural Networks
• Adaptive learning
Ability to learn how to do tasks based on the data
given for training or initial experience.
• Self-organization
Creates its own organization or representation of
information it receives during learning time.
• Real time operation
Computations can be carried out in parallel.
• Fault tolerance via redundant information coding
Partial destruction of neural network cause
degradation of performance. In some cases, it can
be retained even after major network damage.
Multi-disciplinary point of view of
Neural Networks
Application Scope of Neural Network
• Air traffic control • Employee hiring
• Animal behaviour • Expert consultants
• Appraisal and valuation of • Fraud detection
property, etc.,
• Hand writing and typewriting
• Betting on horse races, stock
markets • Lake water levels
• Criminal sentencing • Machinery controls
• Complex physical and • Medical diagnosis
chemical process • Music composition
• Data mining, cleaning and • Photos and finger prints
validation • Recipes and chemical formulation
• Direct mail advertisers • Traffic flows
• Echo patterns • Weather prediction
• Economic modeling
Fuzzy Logic
1. An organized method for dealing with imprecise data is called fuzzy
data
• Fuzzy logic includes 0 and 1 as extreme cases of truth (or "the state
of matters" or "fact") but also includes the various states of truth in
between so that, for example, the result of a comparison between
two things could be not "tall" or "short" but ".38 of tallness.“
• Allows partial membership
• Implemented in small, embedded micro controllers to large,
networked, multichannel PC or work station.
• Can be implemented in hardware, software or in both.
• Fuzzy logic provides a simple way to arrive at a definite conclusion
based upon vague, ambiguous, imprecise, noisy or missing input
information.
Genetic Algorithm
• How genes of parents combine to form those of their children.
• Create an initial population of individuals representing possible solutions
to solve a problem
• Individual characters determine whether they are less or more fit to the
population
• The more fit members will take high probability.
• It is very effective in finding optimal or near optimal solutions.
• Generate and test strategy.
• Differ from normal optimization and search procedures in:
Work with coding of the parameter set
Work with multiple points
Search via sampling( a blind search)
Search using stochastic operators
• In business, scientific and engineering circles, etc.,
Hybrid System
Neuro Fuzzy hybrid system
 Combination of fuzzy set theory and neural networks
 Fuzzy system deal with explicit knowledge that can be explained and
understood
 Neural network deal with implicit knowledge acquired by learning
Advantages
 Handle any kind of information (Numeric, Linguistic, logical)
 Manage imprecise, partial, vague or imperfect information
 Resolve conflicts by collaboration and aggregation.
 Self-learning, self-organizing and self-tuning capability
 No need for prior knowledge of relationship of data
 Mimic human decision making system
 Computation fast by using fuzzy number operations.
Hybrid System
Neuron genetic hybrid system
Topology optimization
 Genetic algorithm used to select a topology for ANN , common one is back
propagation
Genetic algorithm training
 Learning of ANN is formulated ad weight optimization problem, usually
mean squared error as a fitness measure
Control parameter optimization
 Learning rate, momentum rate, tolerance level. etc., are optimized using
GA.
Hybrid System
Fuzzy genetic hybrid system
 Optimization ability of GA are used to best set of rules to be
used for fuzzy inference engine
 Creating the classification rules for a fuzzy system where
objects are classified by linguistic terms.
 Find the appropriate set of rules
 Training data and randomly generated rules are combined to
create initial population
 Fitness function measures the strength of rules, balancing the
quality and diversity of the population.
NEURAL NETWORKS
 Neural networks design is inspired by the design and
functioning of human brains and components
 It has the ability to learn by example
 It has made them very flexible and powerful tool
 The networks are also well suited for real-time systems
 They have fast response and less computational times
 They have a parallel architecture
NEURAL NETWORKS
 Resembles the characteristic of biological neural network.
 Nodes – interconnected processing elements (units or
neurons)
 Neuron is connected to other by a connection link.
 Each connection link is associated with weight which has
information about the input signal.
 ANN processing elements are called as neurons or artificial
neurons , since they have the capability to model networks of
original neurons as found in brain.
 Internal state of neuron is called activation or activity level of
neuron, which is the function of the inputs the neurons
receives.
 Neuron can send only one signal at a time.
ARCHITECTURE OF A SIMPLE ANN
y in  x1w1  x 2 w 2 y  f ( y i n ) f : Activation function
X1, X2 : Input neurons Y : Output neuron
X 1 , X 2 transmit signals,Y receives signal
x1
X1
w1

Y y

x2 w2
X2

x1, x2 : activations of the input neurons : output of input signals


w1, w2 : associated weights, whichcontain information about theinput signals
Activation function
• The function to be applied over the net input
is called activation function.
• Weight involved in ANN is equal to the slope
of linear straight line (y=mx).
TASKS PERFORMED BY ANN
• Pattern-matching
• Classification
• Optimization function
• Approximation
• Vector quantization
• Data Clustering
SCHEMATIC DIAGRAM OF A BIOLOGICAL
NEURON
Synapse : bulb like organ at the end of strands
Strands :The splits at the end of axons
Synapse

Nucleus Strands

Axon

Cell body (Soma)

Dendrites

Dendrites :Where the nerve is connected to the cell body


Axon :Which carries the impulses of theneuron
TERMINOLOGICAL RELATIONSHIP BETWEEN
BIOLOGICAL NN AND ANN
Biological Neuron Artificial Neuron
Cell Neuron

Dendrites Weights or Interconnections

Soma Net input

Axon Output
BIOLOGICAL NN CONTD…
 In the human brain there are approximately 10000 synapses
per neuron

Mathematical representation of the above process in ANN is:


 Suppose there are n inputs from n neurons X 1 , X 2 , ... X n with
activations x 1 , x 2 , . . . x n respectively
 Let the weights of the interconnections between X 1 , X 2 , . . . X n
and the connecting neuron Y be w1, w2 ,...wn respectively
ARTIFICIAL NEURAL NETWORKS
CONTD…
• The net input to the neuron Y is given by the formula:
y in  x 1 w 1  x 2 w 2  ...x n w n
• The activation function is applied to yin to compute the
output
 The weight represents the strength of synapse connecting the
input and the output neurons
 The weights may be positive or negative
 +ve weight means the synapse is excitatory
 -ve weight means the synapse is inhibitory
Brain Vs computer
Term Brain Computer
Speed Execution time is few Execution time is few nano
milliseconds seconds
Processing Perform massive parallel Perform several parallel
operations simultaneously operations simultaneously.
It is faster the biological
neuron
Size and complexity Number of Neuron is 1011 and It depends on the chosen
number of interconnections is application and network
1015. designer.
So complexity of brain is
higher than computer
Storage capacity i) Information is stored in i) Stored in continuous
interconnections or in synapse memory location.
strength. ii) Overloading may destroy
ii) New information is stored older locations.
without destroying old one. iii) Can be easily retrieved
iii) Sometimes fails to recollect
information 31
Contd…

Tolerance Fault tolerant i) Information corrupted


Store and retrieve information if the network
even interconnections fails connections
Accept redundancies
disconnected.
ii) No redundancies
Control mechanism Depends on active CPU
chemicals and neuron Control mechanism is very
connections are strong or simple
weak

32
Characteristics of ANN
 Neurally implemented mathematical model
 Large number of interconnected processing elements called neurons exists
here.
 Interconnections with weighted linkage hold informative knowledge.
 Input signals arrive at processing elements through connections and
connecting weights.
 Processing elements can learn, recall and generalize from the given data by
adjustment of weights
 Computational power is determined by the collective behaviour of neurons.
 ANN is a connection models, parallel distributed processing models, self-
organizing systems, neuro-computing systems and neuro-morphic system
BASIC MODELS OF ARTIFICIAL NEURAL NETWORK

• The models are specified by the three basic entities

 The model’s synaptic (through synapses) interconnections


 The training or learning rules adopted for updating and
adjusting the connection weights
 The activation functions
CONNECTIONS
 The neurons can be visualised for their arrangements in layers
 An ANN consists of a set of highly interconnected processing elements
called neurons
 Output of each processing element is found to be connected through
weights to the other processing elements or itself
 Delay lead and lag-free connections are allowed
 Arrangement of these processing elements and the geometry of their
interconnections are essential for an ANN
 The point where the connection originates and terminates
should be noted
 The function of each processing element in an ANN should be specified
BASIC NEURON CONNECTION
ARCHITECTURES
There are five types of basic connections:
 SINGLE-LAYER FEED FORWARD NETWORK
 MULTI-LAYER FEED FORWARD NETWORK
 SINGLE NODE WITH ITS OWN FEEDBACK
 SINGLE-LAYER RECURRENT NETWORK
 MULTI-LAYER RECURRENT NETWORK
SINGLE LAYER FEED FORWARD
NETWORK
• Architecture
X1 w11
Y1
w21
w12
X2 w22
Y2
Input wn1 Output
neurons w23 wn2 neurons

w13

Xn Ym
wn3

Input Layer Output Layer


Single layer Feed- Forward Network
 Layer is formed by taking processing elements and
combining it with other processing elements.
 Input and output are linked with each other
 Inputs are connected to the processing nodes with
various weights, resulting in series of outputs one
per node.

38
SINGLE LAYER FEED FORWARD
NETWORK

• This architecture is called the single-layer feed forward


network
• The input nodes are say X i , i  1, 2,...n
• The output nodes be say Y j , j  1, 2...m

The connections from ‘n’ input nodes to the ‘m’ output nodes
may be assigned with weights be given by
wij ,i 1, 2...n; j 1, 2...m
Multilayer Feed-forward
Network
• Formed by the interconnection of
several layers.
• Input layer receives input and buffers
input signal.
• Output layer generated output.
• Layer between input and output is
called hidden layer.
• Hidden layer is internal to the
network.
• Zero to several hidden layers in a
network.
• More the hidden layer, more is the
complexity of network, but efficient
output is produced.

40
Feedback Network
• If no neuron in the output layer is an input
to a node in the same layer / proceeding
layer – feed forward network.
• If outputs are directed back as input to the
processing elements in the same
layer/proceeding layer –feedback network.
• If the output are directed back to the input
of the same layer then it is lateral
feedback.
• Recurrent networks are networks with
feedback networks with closed loop.
• Fig 2.8 (A) –simple recurrent neural
network having a single neuron with
feedback to itself.
• Fig 2.9 – single layer network with
feedback from output can be directed to
processing element itself or to other
processing element/both.

41
SINGLE NODE WITH ITS OWN
FEEDBACK

Output
Input

Feedback
SINGLE LAYER RECURRENT
NETWORK

X1 w11
Y1

X2 w22
Y2

wnm
Xn Ym

Processing element output can be directed back to the processing element itself or to the
other processing element or both.
Multilayer Recurrent network
• Processing element output
can be directed back to the
nodes in the preceding layer,
forming a multilayer
recurrent network.
• Processing element output
can be directed to
processing element itself or
to other processing element
in the same layer.

44
Learning
Neural network adapts itself to a stimulus by making proper parameter
adjustment, resulting in the production of desired response.
Two broad kinds of learning in ANNs is
 Parameter learning – updates connecting weights in a neural net.
 Structure learning – focus on change in the networks structure
(Number of processing elements, types of
connection between nodes)
Apart from these, learning in ANN is classified into three categories as
 Supervised learning
 Unsupervised learning
 Reinforcement learning

45
Supervised Learning
 Learning with the help of a teacher.
 In ANN, each input vector requires a
corresponding target vector, which
represents the desired output.
 The input vector along with target vector
is called training pair.
 The input vector results in output vector.
 The actual output vector is compared with
desired output vector.
 If there is a difference means an error
signal is generated by the network.
 It is used for adjustment of weights until
actual output matches desired output.
Unsupervised learning
• Learning is performed without the help
of a teacher.
• Example: tadpole – learn to swim by
itself.
• In ANN, during training process,
network receives input patterns and
organize it to form clusters.
• From the Fig. it is observed that no
feedback is applied from environment to
inform what output should be or whether
they are correct.
• The network itself discover patterns,
regularities, features/ categories from the
input data and relations for the input data
over the output.
• Exact clusters are formed by discovering
similarities & dissimilarities so called as
self – organizing.
47
Reinforcement learning
 Similar to supervised learning.
 Learning based on critic information
is called reinforcement learning &
the feedback sent is called
reinforcement signal.
 The network receives some feedback
from the environment.
 Feedback is only evaluative.
 The external reinforcement signals
are processed in the critic signal
generator, and the obtained critic
signals are sent to the ANN for
adjustment of weights properly to get
critic feedback in future.
48
Activation functions
 To make work more efficient and for exact output, some force or activation
is given.
 Like that, activation function is applied over the net input to calculate the
output of an ANN.
 Information processing of processing element has two major parts
input and output.
 An integration function (f) is associated with input of processing element.
 This function serves to combine activation, information or evidence from
an external source or other processing elements into a net input to the
processing element

49
Activation functions
1. Identity function:
It is a linear function which is defined as
f(x) =x for all x
The output is same as the input. The input layer uses the identify
activation function

2. Binary step function


It is defined as

where θ represents thresh hold value.


It is used in single layer nets to convert the net input to an output that
is binary. ( 0 or 1)
Activation functions
3. Bipolar step function:
It is defined as

where θ represents threshold value.

This function used in single layer nets to convert the net input to an output
that is bipolar (+1 or -1).

51
Activation functions
3.Sigmoid function
Back propagation nets.
Two types:
a) binary sigmoid function
logistic sigmoid function or unipolar sigmoid function.

where λ – steepness parameter.


The derivative of this function is
f’(x) = λ f(x)[1-f(x)]. The range of sigmoid function is 0 to 1.
Activation functions
b) Bipolar sigmoid function

where λ- steepness parameter and the sigmoid range is between -1 and +1.
5. Ramp function

53
54
Important Terminologies
• Weight
– The weight contain information about the input signal.
– It is used by the net to solve the problem.
– It is represented in terms of matrix & called as
connection matrix.
– If weight matrix W contains all the elements of an
ANN, then the set of all W matrices will determine the
set of all possible information processing
configuration.
– The ANN can be realized by finding an appropriate
matrix W
55
Important Terminologies
• Bias
– Bias has an impact in calculating net input.
– Bias is included by adding x0 to the input vector x.
– The net output is calculated by

– The bias is of two types


– Positive bias
» Increase the net input
– Negative bias
» Decrease the net input
56
Important Terminologies
• Threshold
 It is a set value based upon which the final output is calculated.
 Calculated net input and threshold is compared to get the
network output.
 The activation function of threshold is defined as

where θ is the fixed threshold value

57
Important Terminologies
• Learning rate
 Denoted by α.
 Control the amount of weight adjustment at each step of training.
 The learning rate range from 0 to 1.
 Determine the rate of learning at each step
• Momentum Factor
 Convergence is made faster if a momentum factor is added to the weight
updation process.
 Done in back propagation network.
• Vigilance parameter
 Denoted by ρ.
 Used in Adaptive Resonance Theory (ART) network.
 Used to control the degree of similarity.
 Ranges from 0.7 to 1 to perform useful work in controlling the number of
clusters.

58
Problems -1
For network shown in figure, calculate the net input to the
output neuron.
Problem -2
Calculate the net input for the network shown in figure
Problem -3
• Obtain the output of the neuron Y for the network shown in
figure using activation function
• i. Binary sigmoid ii. Bipolar sigmoid
Problem 3
Solution
𝑛
𝑦𝑖𝑛 = 𝑖 𝑥𝑖 𝑤𝑗
=b+x1W1+x2W2+x3W3 = 0.53
Binary sigmoid activation
= 0.625
Biplolar sigmoid activation function
= 0.259
Mcculloch-Pitts (M-P) neuron
Mcculloch-Pitts (M-P) neuron
 Discovered in 1943.
 Usually called as M-P neuron.
 M-P neurons are connected by directed weighted paths.
 Activation of M-P neurons is binary (i.e) at any time step the neuron may
fire or may not fire.
 Weights associated with communication links may be excitatory(weights
are positive)/inhibitory(weights are negative).
 Fixed threshold for each neuron and if the net input to the neuron is
greater than the threshold then the neuron fires.
 They are widely used in logic functions.

64
Mcculloch-Pitts neuron
 A simple M-P neuron is shown in the
figure.
 It is excitatory with weight (w>0) /
inhibitory with weight –p (p<0).
 In the Fig., inputs from x1 to xn possess
excitatory weighted connection and
Xn+1 to xn+m has inhibitory weighted
interconnections.
 Since the firing of neuron is based on
threshold, activation function is
defined as

65
Mcculloch-Pitts neuron (Contd…)
 For inhibition to be absolute, the threshold with the activation
function should satisfy the following condition:
θ >nw –p
 Output will fire if it receives “k” or more excitatory inputs but
no inhibitory inputs where
kw≥θ>(k-1) w
 The M-P neuron has no particular training algorithm.
 An analysis is performed to determine the weights and the
threshold.
 It is used as a building block where any function or
phenomenon is modeled based on a logic function.
66
Problem -4
Implement AND function using McCulloch-pitts neuron
(take binary data).

x1 x2 y

1 1 1
1 0 0
0 1 0
0 0 0
Assume weights be w1 = 1 and w2 = 1.
yin=x1w1+x2w2
(1,1) = l x 1+1 x I =2
(l,0) =1x1 +0 X 1 = 1
(0, 1) = 0 X 1+ 1 X 1 = 1
(0,0) = 0 X 1 +OX 1 = 0
The net input is calculated as 2.
if the threshold value is greater than or equal
to 2 then the neuron fires, else it does nor fire.
So the threshold value is set equal to2((ϴ= 2).
θ ≥ nw –p
n=2, w=1 p=0
ϴ≥2x1-0 => ϴ≥2
Thus , the output of neuron Y can be written
as
ϴ=2, W1=1, W2=1
SUPERVISED LEARNING NETWORK

Perceptron Networks
Perceptron Networks
 The basic networks in supervised learning
 The perceptron network consist of three units
– Sensory unit (input unit)
– Associator unit (hidden unit)
– Response unit (output unit)
Perceptron Networks
• Input unit is connected to hidden units with fixed weights -
1,0,-1 assigned at random
• Binary activation function is used in input and hidden unit
• Output unit (1,0,-1) activation. The binary step with fixed
threshold ϴis used as activation.
• Output of perceptron is
y  f ( yin )

1ifyin   
 
f ( yin )  0if    yin   
 1 yin   
 
Perceptron Networks
• Weight updation between hidden and output unit
• Checks out for error between hidden and calculated output
layer
• Error=target-calculated output
• weights are adjusted in case of error
wi (new)  wi (old )  txi
b(new)  b(old )  t
• α is the learning rate, ‘t’ is the target which is -1 or 1.
• No error-no weight change-training is stopped
Single classification perceptron
network
Perceptron Training Algorithm for
Single Output Classes
Step 0: initialize weights, bias, learning rate( 0<  <=1)
Step 1: perform step 2-6 until final stopping condition is false
Step 2: perform steps 3-5 for each bipolar or binary training pair indicated by s:t
Step 3: input layer is applied with identity activation fn:
xi=si
Step 4: calculate output response of each input j=1 to m
first, net input is calculated

activation are applied over the net input to calculate the output response.
Perceptron Training Algorithm for
Single Output Classes
Step 5: Make adjustment in weight and bias j=1 to m and i=1 to n

Step 6: Test for the stopping condition. If there no change in


weight then stop the training process else start again from step2.
Flow chart for perceptron Network
with single output
Example
Implement AND function using perceptron
networks for bipolar inputs and target.

x1 X2 t
1 1 1
1 -1 -1
-1 1 -1
-1 -1 -1
• The perceptron network, which uses perceptron learning
rule, is used to train the AND function.
• The network architecture is as shown in Figure.
• The input patterns are presented to the network one by one.
When all the four input patterns are presented, then one
epoch is said to be completed.
• The initial weights and threshold are set to zero
W1 = W2. = b = 0 and ϴ= 0. The learning rate is set equal to=1.
• First input, (1,1,1) Calculate the net input yin

• The output y is computed by activation function over the


net input calculated:

ϴ = 0 hence, when yin =0, then y=0


Check whether t=y . Here t=1 and y=0
so t≠y, hence weight updation take place
The weight W1 = 1, W2 = l, b = 1 are the final weights after first
input pattern is presented.
The same process is repeated for all the input patterns.
The process can be stopped when all the targets become equal
to the calculated output or when a separating line is obtained
using the final weights for separating the positive responses
from negative responses.
Input Target Net Calculated Weight changes Weight
output
input
x1 x2 1 (t) yin y ∆w1 ∆w2 ∆wb W1 W2 b
0 0 0
1 1 1 1 0 0 1 1 1 1 1 1
1 -1 1 -1
-1 1 1 -1
-1 -1 1 -1
EXAMPLE
• Find the weights using perceptron network for the
given truth table. when all the inputs are presented
only one time. Use bipolar inputs and targets.

x1 x2 t
1 1 -1
1 -1 1
-1 1 -1
-1 -1 -1
NETWORK STRUCTURE
• S

1
b
w1 y
x1 X1 Y
w2

x2
X2
COMPUTATIONS
• Let us take w1  w2  0;   1,  0
b=0
First input: (1, 1, -1)
yin  b  x1w1  x2 w2  0 1 0 1 0  0
• The output using activation function is

1, if yin  0;
y  f ( yin )  0, if yin  0;
 1, if y  0.
 in
COMPUTATIONS
• So, output (y = 0)  (t = -1)
• So weight updation is necessary

w1(new)  w1(old )  .t.x1


 0 1 (1) 1
 1
w2 (new)  w2 (old )  .t.x1
 0 1 (1) 1
 1
b(new)  1
• The new weights are (-1, -1, -1)
COMPUTATIONS
• Second input: (1,-1,1)
yin  b  x1w1  x2 w2  11 (1)  (1 1)  1
( y  f ( yin )  1)  (t  1)
• So, new weights are to be computed
w1(new)  w1(old )  .t.x1 w2 (new)  w2 (old )  .t.x1
 (1) 111  111 (1)
0  2

b(new)  0

• The new weights are (0, -2, 0)


COMPUTATIONS
• Third input: (-1,1,-1)
yin  b  x1w1  x2 w2  0  (1)  0  (1 2)  2
( y  f ( yin )  1)  (t  1)
• Weight updation is not necessary
• So, new weights are not to be computed
• Fourth input: (-1, -1, -1)
yin  b  x1w1  x2 w2  0  (1)  0  (1 2)  2
( y  f ( yin )  1)  (t  1)

• So, new weights are to be computed


COMPUTATIONS
w1(new)  w1(old )  .t.x1
• S

 0 1 (1)  (1)


1
w2 (new)  w2 (old )  .t.x1
 2 1 (1)  (1)
 1
b(new)  1

• The new weights are (1, -1, -1)


FINAL ANALYSIS
Input Target Net Calculated weights
Input output

x1 x2 b t yin y w1 w2 b

1 1 1 -1 0 0 -1 -1 -1

1 -1 1 1 -1 -1 0 -2 0

-1 1 1 -1 -2 -1 0 -2 0

-1 -1 1 -1 2 1 1 -1 -1
EXAMPLE

Find the weights required to perform classification


using perceptron network. The vectors (1, 1, 1, 1) and
(-1, 1, -1, -1) are belonging to the class (so have target
value 1), vectors (1, 1, 1,-1) and (1, -1, -1, 1) are not
belonging to the class (so we have target value -1).
Assume learning rate as 1 ,initial weight as 0 and   0.2
INITIAL TABLE
• The truth table is given by

x1 x2 x3 x4 b t

1 1 1 1 1 1

-1 1 -1 -1 1 1

1 1 1 -1 1 -1

1 -1 -1 1 1 -1
COMPUTATIONS
• Here we take w1  w2  w3  w4  0,b=0,   0.2 . Also,   1
• The activation function is given by
1, if yin  0.2;
y  0, if yin  -0.2 ≤ 𝑦𝑖𝑛 ≤ 0.2;
 1, if y   0.2.
 in

• The net input is given by


yin  b  x1w1  x2 w2  x3w3  x4 w4
• The next table reflects the training performed with weights
computed
COMPUTATIONS
Input Target Net Out Weights
Inp put
ut
EPOCH-1 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 0 0 1 1 1 1 1

-1 1 -1 1 1 1 -1 -1 0 2 0 0 2

1 1 1 -1 1 -1 4 1 -1 1 -1 1 1

1 -1 -1 1 1 -1 1 1 -2 2 0 0 0
COMPUTATIONS
Input Target Net Out weights
Input put
EPOCH-2 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 0 0 -1 3 1 1 1

-1 1 -1 1 1 1 4 1 -1 3 1 1 1

1 1 1 -1 1 -1 5 1 -2 2 0 2 0

1 -1 -1 1 1 -1 -2 -1 -2 2 0 2 0
COMPUTATIONS
Input Target Net Out Weights
Input put
EPOCH-3 x x4 b t Y in y w1 w2 w3 w4 b
x1 x2 3
1 1 1 1 1 1 2 1 -2 2 0 2 0

-1 1 -1 1 1 1 6 1 -2 2 0 2 0

1 1 1 -1 1 -1 -2 -1 -2 2 0 2 0

1 -1 -1 1 1 -1 -2 -1 -2 2 0 2 0

Here the target outputs are equal to the actual outputs. So, we stop.
THE FINAL NET
• s

1
x1 b 0
X1
w1  2

x2 w2  2 Y y
X2
w3  0
x3
X3 w4  2

x4 X4
ADALINE Networks
Adaptive Linear Neuron (Adaline)
• A network with a single linear unit is called an ADALINE (ADAptive
LINear Neuron)
• Input-output relationship is linear
• It uses bipolar activation for its input signals and its target output
• Weights between the input and output are adjustable and has only
one output unit
• Trained using Delta rule (Least mean square) or (Widrow-Hoff rule)
Delta Rule
• Delta rule for Single output unit
– Minimize the error over all training patterns.
– Done by reducing the error for each pattern one at a time
• Delta rule for adjusting the weight for ith pattern is (i=1to n)
wi   (t  yin) xi
• Delta rule in case of several output units for adjusting the
weight from ith input unit to jth output unit

wij   (t  yinj ) xi
Adaline Model
x0=1
1

b
x1 w1
X1
yin= xiwi
f(yin)
w2 
x2
X2 wn

yin

xn
Xn e=t-yin
Output error t
Adaptive
generator
algorithm
Adaline Training Algorithm
Step 0: Weights and bias are set to some random values other
than zero. Learning rate parameter α

Step 1: Perform Steps 2-6 when stopping condition is false.

Step 2: Perform steps 3-5 for each bipolar training pair s:t

Step 3: Set activations for input units i=1 to n xi=si

Step 4: Calculate the net input to the output unit


n
yin  b   xiwi
i 1
Adaline Training Algorithm
Step 5: Update the weights and bias for i=1 to n

wi (new)  wi (old )   (t  yin) xi

b(new)  b(old )   (t  yin)

• Step 6: If highest weight change that occurred during training


is smaller than a specified tolerance then stop the training
else continue. (Stopping condition)
Start
Stop

Y
Initialize weights and
bias and α
If Ei=Es

Input the specified


tolerance error Es

Calculate error
Ei=Σ(t-yin)2

For
each wi (new)  wi (old )   (t  yin ) xi
s:t b(new)  b(old )   (t  yin )

Activate input units Calculate net input


Xi=si Yin=b+Σxi wi
Testing Algorithm
• Step 0: Initialize the weights(from training algorithm)
• Step 1: Perform steps2-4 for each bipolar input vector x
• Step 2: Set the activations of the input units to x
• Step 3: Calculate the net input
yin  b   xiwi

• Step 4: Apply the activation function over the net input


calculated
Example-7
Implement OR function with bipolar inputs and
target using Adaline network.

x1 x2 1 t
1 1 1 1
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
The initial weights are taken to be W1 = W2 =b = 0.1

learning rate α = 0.1.

For the first input sample, X1 = 1, X2 = 1, t = 1,

Calculate the net input as


Input Target Net Weight changes Weight Error
inpu
t
x1 x2 1 (t) yin (t-yin) ∆w1 ∆w2 ∆wb W1 W2 b
0.1 0.1 0.1 (t-yin)2
1 1 1 1 0.3 0.7 0.07 0.07 0.07 0.17 0.17 0.17 0.49
1 -1 1 1
-1 1 1 1
-1 -1 1 -1
Epoch = Total Error value
• Epoch 1= (0.49+0.69+0.83+1.01) = 3.02
• Epoch 2= (0.046+0.564+0.629+0.699) =1.938
• Epoch 3 = (0.007+0.487+0.515+0.541) =1.550
Epoch4 = (0.076+0.437+0.448+0.456)= 1.417
Epoch 5 = (0.155+0.405+0.408+0.409) = 1.377
BACK-PROPAGATION NETWORK
(BPN)
BACK PROPOGATION NETWORK
• Back propagation learning algorithm is one of the most important
developments in neural networks
• This learning algorithm is applied to multilayer feed-forward
networks consisting of processing elements with continuous
differentiable activation functions
• The networks using the back propagation learning algorithm
are also called back propagation networks (BPN)
• Given a set of input, output pairs this algorithm provides a
procedure for changing the weights in a BPN to classify the given
input pattern correctly.
• The basic concept used for weight updation is the gradient-
descent method as used in simple perceptron network
BACK PROPOGATION NETWORK
• In this method the error is propagated back to the hidden unit
• Neural network is to train the net to achieve a balance between
the net's ability to respond (memorization) and its ability to give
reasonable responses to the input that similar but not identical
to the one that is used in training (generalization)
• It differs from other networks in respect to the process by which
the weights are calculated during the learning period
• When the number of hidden layers is increased the
complexity increases
• The error is usually measured at the output layer
• At the hidden layers there is no information about the errors
• So, other techniques are needed to be followed to calculate the
error at the hidden layers so that the ultimate error is minimized
ARCITECTURE OF BACK PROPOGATION
NETWORK
• It is a multi layer, feed forward neural network
• It has one input layer,one hidden layer and one output layer
• The neurons in the hidden and output layer have biases

• During the back-propagation phase of learning, signals are sent in


the reverse direction
• The output obtained from the net can be
Binary {0, 1} / Bipolar {-1, 1}
• The activation functions should increase monotonically,
differentiable
DIAGRAMATIC REPRESENTATION OF THE ARCHITECTURE OF BPN

A 1 bias 1 w01
bias t1
x1 X1 y1
Y1
vo1

Z1 yk t2
Xi Yk
x2
vij wjk

Zj
ym tm
Ym
xn Xn

Zp
Input Layer Hidden Layer Output Layer
BACK PROPOGATION NETWORK

The training of the BPN is done in three stages

 The feed-forward of the input training pattern

 Calculation and back-propagation of the error

 Updation of weights.
FLOWCHART DESCRIPTION FOR TRAINING PROCESS

• x= input training vector ( x 1 , x 2 , . . . x n ) [‘n’ in number]


• t= target output vector ( t 1 , t 2 , . . . t m ) [‘m’ in number]
•  = learning
𝑡ℎ
rate parameter
• x i = 𝑖 input unit
• v0 j = bias on 𝑗𝑡ℎ hidden unit
• wok = bias on 𝑘𝑡ℎ output unit
• z j = 𝑗𝑡ℎ hidden unit
𝑡ℎ
• Then we have input to the 𝑗 hidden unit is given by
n
(z j )in  voj  xi.vij , j 1, 2...p
i1
FLOWCHART DESCRIPTION FOR TRAINING PROCESS
CONTD…

• The output from the 𝑗𝑡ℎ hidden unit is given by


z j  f ((z j )in )
• Let y k be the 𝑘 𝑡ℎ output unit. The net input to it is
p
( yk )in  wok   z j .wjk , k  1, 2...m
j 1

• The output from the 𝑘 𝑡ℎ output unit is


yk  f (( yk )in )
ACTIVATION FUNCTIONS USED
• The commonly used activation functions are binary sigmoidal
and bipolar sigmoidal functions have the properties:
 x
1
1 e f (x)   x
,
f (x)   x
, 1 e
• Differentiable 1 e
• Monotonously Non-decreasing
• So, these functions are used as the activation functions. Here
• k = Error correction weight adjustment for wjk
• This error is at the output unit y k
• This is back-propagated to the hidden units which have fed
into y k
ERROR CORRECTION
CONTD…
• j
is the error correction weight adjustment for v
ij
• This error correction has occurred due to the back
propagation of error to the hidden unit z j
BPN TRAINING ALGORITHM
• STEP 0: Initialize weights and learning rate (some random
small values are taken)
• STEP 1: Perform Steps 2 -9 when stopping condition is false
• STEP 2: Perform steps 3 – 8 for each training pair
Feed-forward phase( phase I)
• STEP 3: Each input unit receives input signal x i , i=1,…n
and sends it to the hidden unit
• STEP 4: Each hidden unit z j , j = 1,…p sums its weighted input
signals to calculate net input:
n
(z j )in  voj   xi .vij
i1
TRAINING ALGORITHM
• Calculate the outputs from the hidden layer by applying
activation functions over(zin ) j , (binary or bipolar sigmoidal
activation function)
z j  f ((zin ) j )
These signals are sent as input signals for the output units
Step 5:
• For each output unit yk , k  1,...m , calculate the net input
p

( yk )in  w0k   z j .wjk


j 1

Apply the activation function to compute the output signal

yk  f (( yk )in ), k  1,2...m.
TRAINING ALGORITHM (BACK PROPAGATION OF ERROR)
Back-propagation of error (Phase II)
• STEP 6: Each output unit yk , k  1, 2...m receives a target
pattern to the input training pattern and computes the error
correction using:
k  (tk  yk ) f '(( yk )in )
Note : f ' ( yin )  f ( yin )[1  f ( yin )]

On the basis of the calculated error correction term, update the


change in weights and bias:
wjk  .k .z j
w0k  .k
• Also, send k to the hidden layer backwards
TRAINING ALGORITHM (BACK PROPAGATION OF ERROR)

• STEP 7: Each hidden unit z j , j 1, 2...p sums its delta inputs
from the output units:
m

( )inj  .kw jk
k =1

• The term ( )inj gets multiplied with the derivative of f ((z j )in )
to calculate the error term:
𝜹𝒋 = 𝜹𝒊𝒏𝒋 𝒇′ 𝒛𝒊𝒏𝒋
Update the changes in weight and bias

v ij   .  xi
j

 v   .j
oj
BPN TRAINING ALGORITHM

Weights and Bias Updation (Phase III)


Step 8: Each output unit, yk , k  1, 2,...m updates the bias and
weights as:
wjk (new)  wjk (old)   w jk
w0k (new)  w0k (old)   w0k
Each hidden unit, z j , j 1,...p updates its bias and weights as:
vij (new)  vij (old)   vij
v0 j (new)  v0 j (old)   v0 j

• STEP 9: Check for the stopping condition. The stopping


condition may be a certain number of cycles reached or
when actual output is equal to the target output (That is
error is zero)
TESTING ALGORITHM FOR BPN

• STEP 0: Initialize the weights. (The weights are taken from the
training phase
• STEP 1: Perform steps from 2 – 4 for each input vector
• STEP 2: Set the activation of input for xi , I = 1,2,…n
• STEP 3: Calculate the net input to hidden unit z and its output


n
(z in ) j  voj  i 1
xi .vij and z j  f ((zin ) j )
• STEP 4: Compute the net input and the output of the output
layer ( y )  w  p z .w , y  f (( y ) )
in k 0k  j1 j jk k in k

• USE SIGMOIDAL FUNCTIONS AS ACTIVATION FUNCTIONS
EXAMPLE-1
• Using the Back propagation network, find the new weights
for the net shown below. It is presented with the input
pattern [0,1] and the target output 1. Use a learning rate of
0.25 and binary sigmoidal activation function.

1
0.3 1
0.2
0 0.6 0.5 0.4 y
X1 Z1 Y
0.3
0.1
1 0.1
X2 Z2
0.4
COMPUTATIONS
• The initial weights are:
v11  0.6, v21  0.1, v01  0.3
v12  0.3, v22  0.4, v02  0.5
w1  0.4, w2  0.1, w0  0.2
• The learning rate:  0.25 .
1
• The activation function is: Binary sigmoidal , i.e. f (x) 
1 e x
Phase I :The feed-forward of the input training pattern
Calculate the net input , for hidden layer
• For z 1 ne uron :

(z1 )in  v01  v11.x1  v21.x2  0.3  0  (0.6) 1 (0.1)  0.2
• For z2 neuron:
(z2 )in  v02  v12 .x1  v22 .x2  0.5  0 (0.3) 1(0.4)  0.9

• Applying the activation function to calculate the


output, for hidden layer
Calculate the net input entering the output layer ,
• Input:
yin  w0  w1z1  w2 z2  0.2  0.5498 0.4  0.7109  0.1  0.09101

• Output:

Applying activation function to calculate the output,


Phase II: Calculation and back-propagation of the error
• We use the gradient descent formula: k  (tk  yk ) f ' (( yk )in )
• We have,
f ' ( yin )  f ( yin )[1  f ( yin )]  0.5227 [1  0.5227]  0.2495
• Here k = 1. So,
1  (1 0.5227) (0.2495)  0.1191
We next compute the changes in weights between the hidden
and the output layer
w1  .1.z1  0.25 0.1191 0.5498  0.0164
w2  .1 .z2  0.25 0.1191 0.7109  0.02117
w0  .1  0.25 0.1191  0.02978
COMPUTATIONS CONTD…
Compute the error portion between input and hidden layer

• The general formula is  j  (in ) j  f ' ((zin ) j )


• Each hidden unit sums its delta inputs from the output units.
m

(in ) j    k .w jk
k 1
COMPUTATIONS CONTD…

Here, m =1 (the output neuron). So, (in ) j  1.wj1


• So, ( )   .w  0.1191 0.4  0.04764
in 1 1 11
(in )2  1.w21  0.1191 0.1  0.1191
• Now, Error correction
f ' ((zin )1 )  f ((zin )1 )[1 f (zin )1 ]  0.5498 [1 0.5498]  0.2475
Hence, 1  (in )1. f ' ((zin )1 )  0.04764  0.2475  0.0118
• Again,
f ' ((zin )2 )  f ((zin )2 )[1  f (zin )2 ]  0.7109 [1  0.7109]  0.2055

So, 2  (in )2 . f ' ((zin )2 )  0.01191 0.2055  0.00245



COMPUTATIONS CONTD…
Now find the changes in weights between input and hidden layer

v11  .1.x1  0.25 0.0118 0  0


v21  .1.x2  0.25 0.01181  0.00295
v01  .1  0.25 0.0118  0.00295

v12  .2 .x1  0.25 0.00245 0  0


v22  .2 .x2  0.25 0.002451  0.0006125
v02  .2  0.25 0.00245  0.0006125
Phase :III Compute the final weights of the network
• s
v11 (new)  v11 (old )  v11  0.6  0  0.6
v12 (new)  v12 (old )  v12  0.3  0  0.3
v21 (new)  v21 (old )  v21  0.1  0.00295  0.09705
v22 (new)  v22 (old )  v22  0.4  0.0006125  0.4006125
v01 (new)  v01 (old )  v01  0.3  0.00295  0.30295
v02 (new)  v02 (old )  v02  0.5  0.0006125  0.5006125

w1(new)  w1(old)   w1  0.4  0.0164  0.4164


w2 (new)  w2 (old)   w 2  0.1 0.02117  0.12117
w0 (new)  w0 (old)   w 0  0.2  0.02978  0.17022
LEARNING FACTORS OF BACK PROPAGATION ALGORITHM

Convergence of the BPN is based upon several important factors

• Initial Weights

• Learning rate

• Upgradation rule

• Size and nature of the training set

• Architecture (Number of layers and number of neurons per layer)


INITIAL WEIGHTS
• Ultimate solution may be affected by the initial weights
• These are initialized by small random values
• The choice of initial weights determines the speed at which the network
converges
• Higher initial weights may lead to saturation of activation functions from
the beginning by stuck up at a local minimum
• One method to select the weights wij is choosing it in the range

−3 3
,
𝑜𝑖 𝑜𝑖

• oi = The number of processing elements j that feed forward to the


ith processing element
LEARNING RATE

• Learning rate  also affects the convergence of the network

• Larger value of  may speed up the convergence but may

lead to overshooting

• The range of  is 103 to 10

• Large learning rate leads to rapid learning but there will be


oscillation of weights

• Lower learning rate leads to slow learning


MOMENTUM FACTOR
• To overcome the problems stated above (in the previous slide)
we add a factor called momentum factor to the usual gradient
descent method

• Momentum factor  is in the interval [ 0, 1]

•  is normally taken to be 0.9


GENERALIZATION
• The best network for generalization is BPN

• A network is said to be generalized when it sensibly interpolates with


input networks that are new to the network.

• Overfitting or overtraining problem occur

• Solution to this problem is to monitor the error on the test set and
terminates the training when the errors increases.

• Improving the ability of the network to generalize from a training


data set to a test data set, it is desirable to make small changes in
the input space of training pattern as part of the training set.
NUMBER OF TRAINING DATA
• The training data should be sufficient and proper

• Training data should cover the entire expected input space.

• Training vectors should be taken randomly from the training set

• Scaling or Normalization has to be done to help learning.


NUMBER OF HIDDEN LAYER NODES
• If the number of hidden layers is more than one in a BPN
then the calculations performed for a single layer are
repeated for all the layer and are summed up at the end
• The size of a layer is very important
• It is determined experimentally
• If the network does not converge then the number of hidden
nodes are to be increased
• If the network converges then the user may try with a few
hidden nodes and settle for a size based on the performance
• In general the size of hidden nodes should be a relatively
small fraction of the input layer
EXAMPLE-2
• Find the new weights, using back-propagation network for
the network shown below. The network is presented with the
input pattern [-1,1] and the target output +1. Use a learning
rate of   0.25 and bipolar sigmoidal activation function.
1
v02  0.5 v01  0.3 1
1 v11  0.6 w0  0.2
X1 Z1 w1  0.4
v12  0.3 y
w2  0.1
Y
v21  0.1

1 X2 Z2
v22  0.4
COMPUTATIONS
• Here, the activation function is the bipolar sigmoidal
activation function, that is
x
f (x)  1 e
x
• and 1 e
v11  0.6, v21  0.1, v01  0.3
v12  0.3, v22  0.4, v02  0.5
w1  0.4, w2  0.1, w0  0.2
• The input vector is [-1, 1] and target vector is t = 1
• Learning rate   0.25
COMPUTATIONS
• The net input:
• For z 1 neuron :
(z1 )in  v01  v11.x1  v21.x2  0.3  (1) (0.6) 1(0.1)  0.4

• For z2 neuron:
(z2 )in  v02  v12 .x1  v22 .x2  0.5  (1)  (0.3) 1 (0.4)  1.2
• Outputs:
1 e( z1 )in 1 e0.4
z1  f ((z1 )in )  ( z1 )in   0.1974
0.4
• 1 e 1 e
1 e( z1 )in 1 e1.2
z2  f ((z2 )in )  ( z2 )in
 1.2
 0.537
1 e 1 e
COMPUTATIONS
• For output layer:
• Input:
yin  w0  w1z1 w2z2  0.2  0.1974 0.4  0.537  0.1  0.22526

• Output
1 e  yin  1 e 0.22526
y  f ( yin )   0.1122
 y in 0.22526
1 e 1 e
COMPUTATIONS
• Error at the output neuron:
• We use the gradient descent formula: k  (tk  yk ) f ' (( yk )in )

f ' ( yin )  0.5[1 f ( yin )][1 f ( yin )]  0.5[1 0.1122] [1 0.1122]  0.4937

• Here k = 1. So, 1  (1 0.1122)(0.4937)  0.5491


• The changes in weights between the hidden and output layer:
1 1 1
• w  . .z  0.25 0.5491(0.1974)  0.0271

w2  .2 .z2  0.25 0.5491 0.537  0.0737


w0  .1  0.25 0.5491  0.1373
COMPUTATIONS

• Next we compute the error portion j between the input and


the hidden layer
• The general formula is  j  (in ) j  f ' ((zin ) j )
• Each hidden unit sums its delta inputs from the output units.

So, m
(in ) j    k .w jk
• k 1
COMPUTATIONS
• Here, m =1 (the output neuron). So, (in ) j  1.wj1
• Hence,
(in )1  1.w11  0.5491 0.4  0.21964
(in )2  1.w21  0.5491 0.1  0.05491
• Now,
f ' ((zin )1 )  0.5 [1 f ((zin )1 )][1  f ((zin )1)]  0.5  (1 0.1974)(1  0.1974)

• So, 1  (in )1  f ' ((zin )1 )


 0.21964 0.5 (1 0.1974)(1 0.1974)
 0.1056
COMPUTATIONS
• and 2  (in )2  f ' ((zin )2 )
 0.5491 0.5 (1 0.537)(1 0.537)
 0.0195
• Now, the changes in the weights between the input and the
hidden layer are
v11  .1.x1  0.25 0.1056 (1)  0.0264
v21  .1.x2  0.25 0.10561  0.0264
v01  .1  0.25 0.1056  0.0264
v12  .2 .x1  0.25 0.0195 (1)  0.0049
v22  .2 .x2  0.25 0.01951  0.0049
v02  .2  0.25 0.0195  0.0049
FINAL WEIGHTS
w1(new)  w1(old)   w1  0.4  0.0271  0.3729
• s

w2 (new)  w2 (old)   w 2  0.1 0.0737  1.737


w0 (new)  w0 (old)   w 0  0.2  0.1373  0.0627
v11 (new)  v11 (old )  v11  0.6  0.0264  0.5736
v12 (new)  v12 (old )  v12  0.3  0.0049  0.3049
v21 (new)  v21 (old )  v21  0.1 0.0264  0.0736
v22 (new)  v22 (old )  v22  0.4  0.0049  0.4049
v01 (new)  v01 (old )  v01  0.3  0.0264  0.3264
v02 (new)  v02 (old )  v02  0.5  0.0049  0.5049
Radial Basis Function(RBF)
Network
Radial Base Function(RBF) Network
 The radial basis function (RBF) is a
classification and functional
approximation neural network
developed by M.J.D. Powell.

 The network uses the most


common nonlinearities such as
sigmoidal and Gaussian kernel
functions.

 The Gaussian functions are also


used in regularization networks.

 The Gaussian function is generally


defined as
Radial Base Function(RBF) Network
Radial Base Function(RBF) Network
Training Algorithm
Step 0: set the weights to small random values
Step 1: perform step 2-8 when the stopping condition is false
Step 2: perform step 3-7 for each input
Step 3: Each input unit receives the input signals and transmits
to the next hidden layer unit.
Step 4: calculate the radial basis function
Step 5: select the centres for the radial base function . The
centre are selected from the set of inputs vector
Radial Base Function(RBF) Network
Step 6: calculate the output from the hidden unit

where x^ji; is the center of the RBF unit for input variables; σithe width of ith
RBF unit; xji the jth variable of input pattern.
Step 7:calculate the output of the neural network

Where k is the number of hidden layer nodes (RBF function);


ynet the output value of mth node in output layer for the nth incoming pattern
Wim weight between ith RBF unit and mth output node; wo the biasing term
at nth output node.
Step 8: calculate the error and test for the stopping condition. The stopping
condition may be number of epochs or to certain extent weight change.
Module 2

Associative Memory Networks


Memory Models

• Auto associative memory

• Hetero associative memory

• Bidirectional associative memory (BAM)

• Hopfiled network
ASSOCIATIVE MEMORY NETWORK

 Associative memory stores a set of patterns as memories


 These kinds of neural networks work on the basis of pattern
association, which means they can store different patterns
and at the time of giving an output they can produce one of
the stored patterns by matching (closely resembles or relates)
them with the given input pattern.
 These types of memories are called content- addressable
memories (CAM)
In contrast to that
 In digital computers we have address- addressable memories
 As in RAM/ROM it is also a matrix memory
ASSOCIATIVE MEMORY NETWORK
ASSOCIATIVE MEMORY NETWORK CONTD…

 The input data is correlated with that of the stored data in


CAM
 The stored patterns must be unique (That is different patterns
are stored in different locations)
 Each data stored has an address
 If multiple copies are stored the data will be correct. But the
address will be ambiguous
 The concept behind this search is to output any one or all
stored items which matches the given search argument
 The stored data is retrieved completely or partially
ASSOCIATIVE MEMORY NETWORK CONTD…

Two types of such networks are there


 Autoassociative
 Heteroassociative
If the output vector is same as the input vectors it is Auto
Associative
If the output vectors are different from the input vectors it is
Hetero Associative
To find the similarity we use the Hamming distance
TRAINING ALGORITHMS FOR PATTERN
ASSOCIATION
• There are two algorithms for training of pattern association
nets
1. Hebb Rule
2. Outer Products Rule
Hebb Rule
• This is widely used for finding the weights of an associative
memory neural net
• Weights are updated until there is no weight change
ALGORITHMIC STEPS FOR HEBB RULE

STEP 0: Set all the initial weights to zero


( wij  0, i  1, 2...n; j  1, 2,...m )
STEP 1: For each training target input output vector pairs (s:t)
perform steps 2 to 4
STEP 2: Activate the input layer units to current training input

STEP 3: Activate the output layer units to the target output


yj=tj (j = 1 to m)
• STEP 4: Start the weight adjustments
OUTER PRODUCT RULE

• Input: s  (s1 , s2 ,..., si ,...sn )


• Output: t  (t1 , t2 ,...t j ,...tm )
• The outer product is defined as: The product of the two
matrices: S  sT and T  t
• So, we get the weight matrix W as:
 s1   s1t1 . s1t j s1tm 
.  
W=S.T =   =  
 si  . t1 . t j . tm   si t1 si t j si tm 
   
.  
 sn 
 snt1 snt j sntm 

OUTER PRODUCT RULE CONTD…

• In case of a set of patterns, s(p): t(p), p=1,…P; we have


s ( p)  ( s1 ( p),..., si ( p),..., sn ( p))
t( p)  (t1 ( p),..., t i ( p),..., t m ( p))

• For the weight matrix W = ( wij )


P
wij   siT ( p).t j ( p), i  1, 2,...n; j  1, 2,...m.
p 1
AUTOASSOCIATIVE MEMORY NETWORK

 The training input and target output vectors are the same
 The determination of weights of the association net is called
storing of vectors
 The vectors that have been stored can be retrieved from
distorted (noisy) input if the input is sufficiently similar to it
 The net’s performance is based on its ability to reproduce a
stored pattern from a noisy input

 NOTE: The weights in the diagonal can be set to ‘0’. These nets
are called auto-associative nets with no self-connection
ARCHITECTURE OF AUTOASSOCIATIVE
MEMORY NETWORK

x1 w11 y1
X1 Y1
wi1
wn1
w1i
xi yi
Xi wii Yi
wni

w1n
win
xn yn
Xn wnn Yn
TRAINING ALGORITHM

• This is same as that for the Hebb rule. Except that there are
same number of output units as the number of input units
• STEP 0: Initialize all the weights to ‘0’
• ( wij  0, i  1, 2,...n; j  1, 2,...n )
• STEP 1: For each of the vector that has to be stored, perform
• steps 2 to 4
• STEP 2: Activate each of the input unit ( xi  si , i  1, 2,...n )
• STEP 3: Activate each of the output units (y j  s j , j  1,2,...n )
• STEP 4: Adjust the weights, i, j = 1,2,…n;
wij (new)  wij (old )  xi . y j  wij (old )  si .s j
TESTING ALGORITHM

• The associative memory neural network can be used to


determine whether the given input vector is a “known” one
or an unknown one
• The net is said to recognize a vector if the net produces a
pattern of activation as output, which is the same as the one
given as input
• STEP 0: Set the weights obtained from any of the two
methods described above
• STEP 1: For each of the testing input vector perform steps 2-4
• STEP 2: Set the activations of the input unit as equal to that of
the input vector
TESTING ALGORITHM CONTD…

• STEP 3: Calculate the net input to each of the output unit:


n
( yin ) j   xi .wij , j  1, 2,...n
i 1

• STEP 4: Calculate the output by applying the activation over


the net input:
1, if ( yin ) j  0;
y j  f (( yin ) j )  
1, if ( yin ) j  0.
• THIS TYPE OF NETWORK IS USED IN SPEECH PROCESSING,
IMAGE PROCESSING, PATTERN CLASSIFICATION ETC.
EXAMPLE---AUTOASSOCIATIVE NETWORK

• Train the autoassociative network for input vector [-1 1 1 1]


and also test the network for the same input vector.
• Test the autoassociative network with one missing, one
mistake, two missing and two mistake entries in test vector.

• The weight matrix W is computed from the formula


P
W   sT ( p).s( p)
p 1
COMPUTATIONS
• S

 1  1 1 1 1
1  1 1 1 1 
W    . 1 1 1 1   
1  1 1 1 1 
   
1  1 1 1 1 
• Case-1: testing the network with the same input vector
• Test input: [-1 1 1 1]
• The weight obtained above is used as the initial weights

---===c
COMPUTATIONS

• Applying the activation function

1, if ( yin ) j  0;

y j  f (( yin ) j )  
1, if ( yin ) j  0.

• Over net input, we get
y = [-1 1 1 1]
• Hence, the correct response is obtained
COMPUTATIONS

TESTING AGAINST ONE MISSING ENTRY


Case 1: [0 1 1 1] (First Component is missing)
• Compute the net input

 1 1 1 1
 1 1 1 1 
( yin ) j  x. W  [0 111].    [ 3 3 3 3]
 1 1 1 1
 
 1 1 1 1 
• Applying the activation function taken above, we get
• y = [-1 1 1 1]. So, the response is correct
COMPUTATIONS

TESTING AGAINST ONE MISSING ENTRY


Case 2: [-1 1 0 1] (Third Component is Missing)
Compute the net input
 1 1 1 1
 1 1 1 1 
( yin ) j  x. W  [ 1 1 0 1].    [ 3 3 3 3]
 1 1 1 1
 
 1 1 1 1 

Applying the activation function taken above, we get


y = [ -1 1 1 1]
The response is correct.
WE CAN TEST FOR OTHER MISSING ENTRIES SIMILARLY
COMPUTATIONS

Testing the network against one mistake entry


Case 1: Let the input be [-1 -1 1 1] (Second Entry is a Mistake)

• Compute the net input  1 1 1 1


 1 1 1 1 
( yin ) j  x. W  [ 1  111].    [ 2 2 2 2]
 1 1 1 1
 
 1 1 1 1 

Applying the same activation function taken above,


y = [-1 1 1 1]
So, the response is correct
COMPUTATIONS

• Case 2: Let the input be [1 1 1 1] (First entry is mistaken)


• Compute the net input

 1 1 1 1
 1 1 1 1 
( yin ) j  x. W  [1 111].    [ 2 2 2 2]
 1 1 1 1
 
 1 1 1 1 
• Applying the activation function taken above
• y = [-1 1 1 1]
• So, the response is correct
COMPUTATIONS

Testing the net against two missing entries


Case 1: Let the input be [0 0 1 1] (First two entries are missing)
Compute the net input  1 1 1 1
 1 1 1 1 
( yin ) j  x. W  [0 0 11].    [ 2 2 2 2]
 1 1 1 1 
 
 1 1 1 1 

• Applying the activation function taken above


• y = [-1 1 1 1], which is the correct response
COMPUTATIONS
Case 2: Let the input be [-1 0 0 1] (Second & Third inputs are missing)
• Compute the net input

 1 1 1 1
 1 1 1 1 
( yin ) j  x. W  [ 1 0 0 1].    [ 2 2 2 2]
 1 1 1 1
 
 1 1 1 1 

• Applying the input function taken above


• y = [-1 1 1 1], which is the correct response
COMPUTATIONS

• Testing the net against two mistaken entries


• Let the input be [-1 -1 -1 1] (Second & Third inputs are mistake)

• Compute the net input  1 1 1 1


 1 1 1 1 
( yin ) j  x. W  [ 1  1  11].    [0 0 0 0]
 1 1 1 1
 
 1 1 1 1 
• Applying the activation function taken above, we get
• y = [-1 -1 -1 -1]. Which is not correct.
• SO, THE NETWORK FAILS TO RECOGNISE INPUTS WITH TWO MISTAKES

• NOTE: WE HAVE TO CHECK FOR ALL POSSIBLE INPUTS UNDER EACH CASE
TO HAVE A POSITIVE CONCLUSION
HETEROASSOCIATIVE MEMORY NETWORK

• The training input and target output are different


• The weights are determined in such a way that the net can
store a set of ‘P’ pattern associations
• Each input vector s(p) has ‘n’ components and each output
vector t(p) has ‘m’ components (p = 1, 2, ….P)
• Determination of weights: Hebb Rule or Outer product rule
• THE NET FINDS AN APPROPRIATE OUTPUT VECTOR
CORRESPONDING TO AN INPUT VECTOR, WHICH MAY BE
ONE OF THE STORED PATTERNS OR A NEW PATTERN
ARCHITECTURE OF HETEROASSOCIATIVE
MEMORY NETWORK
• A

x1 w11 y1
X1 Y1
wi1
wn1
w1i
xi yi
Xi wii Yi
wni

w1m
wim
xn ym
Xn wnm Ym

Differs from Auto- associative network in the number of output units


TESTING ALGORITHM FOR HETEROASSOCIATIVE
MEMORY NETWORK
• STEP 0: Initialize the weights from the training algorithm
• STEP 1: Perform steps 2 – 4 for each input vector presented
• STEP 2: Set the activation for the input layer units equal to
that of the current input vector given, xi
• STEP 3: Calculate the net input to the output units using
n
( yin ) j   xi .wij , j  1, 2,...m
i 1

• STEP 4: Determine the activations for the output units as:


1, if ( yin ) j  0;

y j  0, if ( yin ) j  0; Differs from auto-associative here

1, if ( yin ) j  0.
HETEROASSOCIATIVE MEMORY NETWORK
CONTD…
• There exist weighted interconnections between the input and
output layers
• The input and output layers are not correlated with each
other
• We can also use the following binary activation function

1 𝑖𝑓 𝑦𝑖𝑛𝑗 ≥ 0
𝑦𝑖 =
0 𝑖𝑓 𝑦𝑖𝑛𝑗 < 0
EXAMPLE-HETEROASSOCIATIVE MEMORY
NETWORKS
• Train a heteroassociative memory network using Hebb rule to
store input row vector s1,s2,s3,s4 to the output vector
as given in the table below: t  (t1 , t2 )

Input and targets s1 s2 s3 s4 t1 t2
1st 1 0 1 0 1 0
2nd 1 0 0 1 1 0
3rd 1 1 0 0 0 1
4th 0 0 1 1 0 1
THE NEURAL NET
• s

x1
X1 w11
y1
w12 Y1
x2 w21
X2 w22

w31
x3 w32 y2
X3 Y2
w41
x4 w42
X4
COMPUTATIONS

• We use Hebb rule to determine the weights

• The initial weights are all zeros

• For the first pair


x1  1, x2  0, x3  1, x4  0, y1  1, y2  0
• Set the input and output pairs

• Weight Updation: Formula


• wij (new)  wij (old)  xi . y j
UPDATED WEIGHTS
x1  1, x2  0, x3  1, x4  0, y1  1, y2  0

w11 (new)  w11 (old)  x1 . y1  0  11  1


w12 (new)  w12 (old)  x1 . y2  0  0 1  0
w21 (new)  w21 (old)  x 2 . y1  0  11  1
w22 (new)  w22 (old)  x 2 . y2  0  11  0
w31 (new)  w31 (old)  x 3 . y1  0  1 0  1
0
w32 (new)  w32 (old)  x 3 . y2  0  0  0  0
w41 (new)  w41 (old)  x 4 . y1  0  1 0  0
w42 (new)  w42 (old)  x 4 . y2  0  0  0  0
UPDATED WEIGHTS
• 2
For second vector
X1=1,x2=0, x3=0, x4=1 y1=1, y2=0
𝑤11 𝑛𝑒𝑤 = 𝑤11 𝑜𝑙𝑑 + 𝑥1 . 𝑦1 = 1 + 1x1 = 2
𝑤12 𝑛𝑒𝑤 = 𝑤12 𝑜𝑙𝑑 + 𝑥1 . 𝑦2 = 0 + 1x0 = 0
𝑤21 𝑛𝑒𝑤 = 𝑤21 𝑜𝑙𝑑 + 𝑥2 . 𝑦1 = 0 + 0x1 = 0
𝑤22 𝑛𝑒𝑤 = 𝑤22 𝑜𝑙𝑑 + 𝑥2 . 𝑦2 = 0 + 0x0 = 0
𝑤31 𝑛𝑒𝑤 = 𝑤31 𝑜𝑙𝑑 + 𝑥3 . 𝑦1 = 1 + 0x1 = 1
𝑤32 𝑛𝑒𝑤 = 𝑤32 𝑜𝑙𝑑 + 𝑥3 . 𝑦2 = 0 + 0x0 = 0
𝑤41 𝑛𝑒𝑤 = 𝑤41 𝑜𝑙𝑑 + 𝑥4 . 𝑦1 = 0 + 1x1 = 1
𝑤42 𝑛𝑒𝑤 = 𝑤42 𝑜𝑙𝑑 + 𝑥4 . 𝑦2 = 0 + 1x0 = 0
COMPUTATIONS

• For the third pair


• The initial weights are the outputs from the above updation
• The input-output vector pair is
x1  1, x2  1, x3  0, x4  0, y1  0, y2  1
• Set the input and output pairs
• The weight updation formula is same as above
• The weights change in only two out of 8 cases:
w12 (new)  w12 (old)  x1 . y2  0  11  1
w22 (new)  w22 (old)  x 2 . y2  0  11  1
UPDATED WEIGHTS- Third input vector
• 3

𝑤11 (𝑛𝑒𝑤) = 𝑤11 (𝑜𝑙𝑑) + 𝑥1 . 𝑦1 = 2


𝑤12 (𝑛𝑒𝑤) = 𝑤12 (𝑜𝑙𝑑) + 𝑥1 . 𝑦2 = 1
𝑤21 (𝑛𝑒𝑤) = 𝑤21 (𝑜𝑙𝑑) + 𝑥2 . 𝑦1 = 0
𝑤22 (𝑛𝑒𝑤) = 𝑤22 (𝑜𝑙𝑑) + 𝑥2 . 𝑦2 = 1
𝑤31 (𝑛𝑒𝑤) = 𝑤31 (𝑜𝑙𝑑) + 𝑥3 . 𝑦1 = 1
𝑤32 (𝑛𝑒𝑤) = 𝑤32 (𝑜𝑙𝑑) + 𝑥3 . 𝑦2 = 0
𝑤41 (𝑛𝑒𝑤) = 𝑤41 (𝑜𝑙𝑑) + 𝑥4 . 𝑦1 = 1
𝑤42 (𝑛𝑒𝑤) = 𝑤42 (𝑜𝑙𝑑) + 𝑥4 . 𝑦2 = 0
COMPUTATIONS
• For the fourth pair
• The initial weights are the outputs from the above updation
• The input-output vector pair is
x1  0, x2  0, x3  1, x4  1, y1  0, y2  1
• Set the input and output pairs
• The weight updation formula is same as above
• The weights change in only two of the 8 cases:
w32 (new)  w32 (old)  x1 . y2  0  11  1
w42 (new)  w42 (old)  x 4 . y2  0  11  1
COMPUTATIONS

• The final weights after all the input/output vectors are used
are
 w11 w12   2 1
w w22   0 1
W   21 
 w31 w32  1 1
   
 w41 w42  1 1
Train the heteroassociative using outer product
rule – Another method

For 1 st pair . The input and output vector are s=1010, t=10
Train the heteroassociative using outer product
rule – Another method

Similarly, 2nd, 3rd and 4 th pair input and output vectors.

W= =
EXAMPLE
Train a hetroassociative memory network to store the
input vector s=(s1,s2,s3,s4) to the output vector
t=(t1,t2). The vector pairs are given in table. Also the
test the performance of the network using its training
input as testing input.
Input and targets s1 s2 s3 s4 t1 t2
1st 1 0 0 0 0 1
2nd 1 1 0 0 0 1
3rd 0 0 0 1 1 0
4th 0 0 1 1 1 0
• Outer product rule is determine the weight

Testing the network


First input x=[1000], compute net input,
Applying the activation function over the net
input to calculate the output
𝑦1 = 𝑓 𝑦𝑖𝑛1 = 𝑓 0 = 0
𝑦2 = 𝑓 𝑦𝑖𝑛2 = 𝑓 2 = 1
The output is [0,1] which is correct response for
the first input pattern.
Similarly check the second , third and fourth
input pattern
Second input x=[1100], compute net input

Applying the activation function over the net


input to calculate the output
𝑦1 = 𝑓 𝑦𝑖𝑛1 = 𝑓 0 = 0
𝑦2 = 𝑓 𝑦𝑖𝑛2 = 𝑓 3 = 1
The output is [0,1] which is correct response for
the second input pattern.
Third input x=[0001], compute net input

Applying the activation function over the net input


to calculate the output
𝑦1 = 𝑓 𝑦𝑖𝑛1 = 𝑓 2 =1
𝑦2 = 𝑓 𝑦𝑖𝑛2 = 𝑓 0 =0
The output is [1,0] which is correct response for the
third input pattern.
Fourth input x=[0011], compute net input

Applying the activation function over the net


input to calculate the output
𝑦1 = 𝑓 𝑦𝑖𝑛1 = 𝑓 3 =1
𝑦2 = 𝑓 𝑦𝑖𝑛2 = 𝑓 0 =0
The output is [1,0] which is correct response for
the fourth input pattern.
Problem
Train a heteroassociative network to store the given bipolar input
vector s=(s1,s2,s3,s4) to the output vector t=(t1,t2). Test the
performance of network with missing and mistaken data in value

Input and targets s1 s2 s3 s4 t1 t2


1st 1 -1 -1 -1 -1 1
2nd 1 1 -1 -1 -1 1
3rd -1 -1 -1 1 1 -1
4th -1 -1 1 1 1 -1
BIDIRECTIONAL ASSOCIATIVE MEMORY(BAM)

• It was developed by Kosko in 1988


• It performs both forward and backward searches
• It uses Hebb rule
• It associates patterns from set X to set Y and vice versa
• It responds for input from both layers
• It is a hetero-associative pattern matching network that
encodes binary or bipolar patterns using Hebbian learning
rule
TYPES OF BAM

• There are two types of BAMs


 Discrete
 Continuous
• The characterization depends upon the activation functions
used
• Otherwise, the architecture is same
• Discrete BAM: Binary or Bipolar activation functions are used
• Continuous BAM: Binary sigmoid or Bipolar sigmoid functions
are used
THE BAM ARCHITECTURE
• A

Layer X W Layer Y
x1 w11 y1
X1 Y1
wi1
wn1
w1i
xi yi
Xi wii Yi
wni

w1m
wim
xn ym
Xn wnm Ym
WT
BAM ARCHITECTURE

• Consists of two layers of neurons


• These layers are connected by directed weighted path
interconnections
• The network dynamics involves two layers of interaction
• The signals are sent back and forth between the two layers
until all neurons reach equilibrium
• The weighs associated are bidirectional
• It can respond to inputs in either layer
BAM ARCHITECTURE CONTD…

• The weight matrix from one direction (say X to Y) is W


• The weight matrix from the other direction (Y to X) is W T

• Weight matrix is calculated in both direction


DISCRETE BIDIRECTIONAL ASSOCIATIVE
MEMORY

• When the memory neurons are being activated by putting an


initial vector at the input of a layer, the network evolves a two-
pattern stable state
• Each pattern at the output of one layer
• The network involves two layers of interaction between each
other
• The two bivalent forms (Binary and Bipolar) of BAM are related
to each other
• The weights in both the cases are found as the sum of the
outer products of the bipolar form of the given training vector
pairs
DETERMINATION OF WEIGHTS

• Let the ‘P’ number of input/target pairs of vectors be denoted


by (s(p), t(p)), p = 1,2,…P
• Then the weight matrix to store a set of input/target vectors
P=1, 2,…P
s ( p )  ( s1 ( p ),..., si ( p),..., sn ( p))
t(p)  (t1 ( p ),..., t j ( p),..., t m ( p))

• can be determined by Hebb rule for determining weights for


associative networks P

• In case of binary input vectors wij  [2si ( p)  1].[2t j ( p)  1]


p 1
P
• In case of bipolar input vectors wij   siT ( p).t j ( p)
p 1
ACTIVATION FUNCTIONS FOR BAM

• The activation function is based on whether the input


vector pairs (at the input and target vector) are binary or
bipolar
• ACTIVATION FUNCTION FOR ‘Y’ LAYER
• 1. If both the layers have binary input vectors
1, if ( yin ) j  0;

y j   y j , if ( yin ) j  0;

0, if ( yin ) j  0.
• 2. When the input vector is bipolar
1, if ( yin ) j   j ;

y j   y j , if ( yin ) j   j ;

1, if ( yin ) j   j .
ACTIVATION FUNCTIONS FOR BAM

ACTIVATION FUNCTION FOR ‘X’ LAYER


1. When the input vector is binary

1, if (x in )i  0;

xi   xi , if (x in )i  0;
0, if (x )  0.
 in i

2. When the input vector is bipolar


1, if (x in )i  i ;

xi   xi , if (x in )i  i ;
1, if (x )   .
 in i i
TESTING ALGORITHM FOR DISCRETE BAM

• This algorithm is used to test the noisy patterns entering into


the network
• Basing upon the training algorithm, weights are determined
• Using the weights the net input is calculated for the given test
pattern
• Activations are applied over it to recognize the test patterns
• ALGORITHM:
• STEP 0: Initialize the weights to store ‘p’ vectors
• STEP 1: Perform Steps 2 to 6 for each testing input
• STEP 2: (i) Set the activations of the X layer to the current
input pattern, that is presenting the input pattern x to X.
TESTING ALGORITHM FOR DISCRETE BAM
CONTD…
• (ii) Set the activations of the ‘Y’ layer similarly by presenting
the input pattern ‘y’.
• (iii) Though it is bidirectional memory, at one time step,
signals can be sent from only one layer. So, one of the input
patterns may be zero vector
• STEP 3: Perform steps 4 to 6 when the activations are not
converging
• STEP 4: Update the activations of units in Y layer. Calculate the
net input by n
( yin ) j   xi .wij
i 1
TESTING ALGORITHM FOR DISCRETE BAM
CONTD…
• Applying activations, we obtain y j  f (( yin ) j )
• Send this signal to the X layer
• STEP 5: Update the activations of units in X layer
• Calculate the net input m
( xin )i   y j .wij
j 1
• Apply the activations over the net input xi  f (( xin )i )
• Send this signal to the Y layer
• STEP 6: Test for convergence of the net. The convergence
occurs if the activation vectors x and y reach equilibrium.
• If this occurs stop, otherwise continue.
CONTINUOUS BAM

• It transforms the input smoothly and continuously in the


range 0-1 using logistic sigmoid functions as the activation
functions for all units
• This activation function may be binary sigmoidal function or
it may be bipolar sigmoidal function
CONTINUOUS BAM CONTD…

• In case of binary inputs, (s(p), t(p)), p = 1,2,…P; the weight is


determined by the formula
P
wij  [2si ( p)  1].[2t j ( p)  1]
p 1

• The activation function is the logistic sigmoidal function


• If it is binary logistic function then the activation function is
given by
1
f (( yin ) j )  ( y )
1  e in j
CONTINUOUS BAM CONTD…

• If the activation function used is a bipolar function then it is


given by
( y )
2 1  e in j
f (( yin ) j )   ( yin ) j
1  ( y )
1 e 1  e in j

• These activations are applied over the net input to calculate


the output. The net input can be calculated with a bias as
n
( yin ) j  b j   xi .wij
i 1
• All these formulae are applicable to the X layer also
EXAMPLE-5
• Construct and test a BAM network to associate letters E and F
with single bipolar input-output vectors. The target output for
E is [-1, 1] and F is [1, 1]. The display matrix size is 5 x 3. The
input patterns are:

• Target outputs are [-1, 1] [1, 1]


COMPUTATIONS

• The input patterns are

E 1 1 1 1 -1 -1 1 1 1 1 -1 -1 1 1 1
F 1 1 1 1 1 1 1 -1 -1 1 -1 -1 1 -1 -1

• The output and weights are

E -1 1 W1
F 1 1 W2
COMPUTATIONS

• Since we are considering bipolar input and outputs, the


weight matrix is computed by using the formula:

W   sT ( p). t ( p)

• There are two weight components W1 and W2

W1  [1 111  1  11111  1  1111]T [11]


W2  [1 111111  1  11 1 11 1 1]T [11]
COMPUTATIONS
0 2
0 2
• The total weight matrix is  
0 2
 
 0 2 
2 0
 
2 0
0 2
 
W  W1  W2   2 0 
 
 2 0 
 0 +2 
 
 0 2 
 0 2 
 
0 2
 2 0 
 
 2 0 
TESTING THE NETWORK

• We test the network with test vectors E and F


• Test pattern E 0 2
0 2 

0 2
 
0 2
2 0
 
 2 0
0 2
 
yin  [1111  1  11111  1  1111]  2 0   [12 18]
 
 2 0
0 2
 
0 2 
• Applying the activations, we get 0 2 
 
• Y = [-1 1], which is the correct response 0 2
 2 0
 
 2 0 
TESTING THE NETWORK
0 2
• Test pattern F 0 2
 
0 2
 
 0 2 
2 0
 
 2 0 
0 2
 
yin  [1111111  1  11  1  11  1  1]  2 0   [12 18]
 
 2 0 
0 2
 
 0 2 
• Applying activations over the net input  0 2 
 
we get y = [1 1], which is correct 0 2
 2 0 
 
 2 0 
BACKWARD MOVEMENT

• Y vector is taken as the input

• The weight matrix here is the transpose of the original


• So,

0 0 0 0 2 2 0 2 2 0 0 0 0 2 2 
W 
T

 2 2 2 2 0 0 2 0 0 2 2 2 2 0 0 
TESTING THE NETWORK

• For the test pattern E, the net input is [-1 1]

yxinin  x. W T
0 0 0 0 2 2 0 2 2 0 0 0 0 2 2 
 [11]  
2 2 2 2 0 0 2 0 0 2 2 2 2 0 0 
xyinin  2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
• Applying the activation functions, we get

xy  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

• Which is the correct response


TESTING THE NETWORK

• For the pattern F:


• Here, the input is [1 1]
xinyin  x. W
T

 0 0 0 0 2 2 0 2 2 0 0 0 0 2 2 
 [11]  
 2 2 2 2 0 0 2 0 0 2 2 2 2 0 0 

xinyin   2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

• Applying the activation function we get


xy  1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
• This is the correct response
DISCRETE HOPFIELD NETWORK

• It is network which is:


• Auto-associative
• Fully interconnected
• Single layer
• Feed back
• It is called as recurrent network
• In it each processing element has two outputs
• The output from each processing element is fed back to the
input of other processing elements but not to itself
DISCRETE HOPFIELD NETWORK

• It takes two-valued inputs: {0, 1} or {-1, +1}


• The net uses symmetrical weights with no self-connections
• That is 𝑤𝑖𝑗 = 𝑤𝑗𝑖 and 𝑤𝑖𝑖 = 0, ∀𝑖
• Key Points:
• Only one unit updates its activation at a time
• Each unit is found to continuously receive an external signal along
with the signals it receives from the other units in the net
• When a single-layer recurrent network is performing a sequential
updating process, an input pattern is first applied to the network
and the output is initialized accordingly. Afterwards the initializing
pattern is removed , and the output pattern that initialized
becomes the new updated input through the feedback connection
ARCHITECTURE OF A DISCRETE HOPFIELD NET
• g
x1 x2 xi xn

wi1 win wn1


w21 w2n
w2i wi2 wn2
w1n
w1i wni
w12

Y1 Y2 Yi Yn

y1 y2 yi yn
TRAINING ALGORITHM FOR DISCRETE
HOPFIELD NETWORK
• STEP 0: Initialize the weights to store patterns, i. e. weights
obtained from training algorithm using Hebb rule
• STEP 1: When the activations of the net are not converged, perform
steps 2 - 8
• STEP 2: Perform steps 3 -7 for each input vector X
• STEP 3: Make the initial activation of the net equal to the external
input vector X (i. e. 𝑦𝑖 = 𝑥𝑖 (𝑖 = 1,2. . . 𝑛)
• STEP 4: Perform steps 5 – 7 for each unit 𝑌𝑖
update the activation of the unit in random order
• STEP 5: Calculate the net input of the network:
𝑦𝑖𝑛 𝑖 = 𝑥𝑖 + 𝑦𝑗 𝑤𝑗𝑖
𝑗
TRAINING ALGORITHM FOR DISCRETE
HOPFIELD NETWORK
• STEP 6: Apply the activations over the net input to calculate
the output:
1, if ( yin )i  i ;

yi   yi , if ( yin )i  i ;
0, if ( y )   .
 in i i

• where 𝜃𝑖 is the threshold and is normally taken as zero


• STEP 7: Now feed back the obtained output 𝑦𝑖 to all other
units. Thus the activation vectors are updated
• STEP 8: Finally test the network for convergence
Example

Construct an autoassociative discrete Hopfield


network with input vector[1 1 1 -1]. Test the
discrete Hopfield network with missing entries
in first and second component of the stored
vector.
• The input vector is x=[111-1] . The weight
matrix is given by

• With no self connection,


The binary representation for the given input is [1 1 1 0].
We carry out asynchronous updation of weight here. Y1,Y4,Y3,Y2.
Test the discrete Hopfield network with missing entries in first and second
component of the stored vector. X=[0 0 1 0]

Applying activation function we get yin1>0 y1=1 y= [1010]->


no convergence
Choosing unit Y4 for updating its activation

Applying activation function we get yin4<0 y4=0 y= [1010] ->


no convergence
Choosing unit Y3 for updating its activation

Applying activation function we get yin3>0 y3=1 y= [1010] ->


no convergence
Choosing unit Y2 for updating its activation

Applying activation function we get yin4>0 y2=1 y=


[1110] -> convergence with vector x
Thus , the output y has converged with vector x in this
iteration itself. But, one more iteration can be done to
check whether further activation are there or not.
• Iteration 2

Similarly to check y4,y3 and y2 for updating activation


function

You might also like