KEMBAR78
Soft Computing Unit 2 | PDF | Machine Learning | Computational Neuroscience
0% found this document useful (0 votes)
27 views28 pages

Soft Computing Unit 2

The document outlines various types of artificial neural networks, including Backpropagation Neural Networks, Kohonen Neural Networks, and Learning Vector Quantization. It explains the training processes, activation functions, and advantages and disadvantages of these networks, emphasizing the importance of hidden layers and weight adjustments. Additionally, it discusses the structure and functioning of Hamming Neural Networks and their classification capabilities.

Uploaded by

Rameshkumar M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views28 pages

Soft Computing Unit 2

The document outlines various types of artificial neural networks, including Backpropagation Neural Networks, Kohonen Neural Networks, and Learning Vector Quantization. It explains the training processes, activation functions, and advantages and disadvantages of these networks, emphasizing the importance of hidden layers and weight adjustments. Additionally, it discusses the structure and functioning of Hamming Neural Networks and their classification capabilities.

Uploaded by

Rameshkumar M
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net

UNIT - II

2 Artificial Neural Networks

Scope of the Syllabus

Back propagation Neural Networks - Kohonen Neural Network - Learning Vector

ww
Quantization - Hamming Neural Network - Hopfield Neural Network - Bi-directional
Associative Memory - Adaptive Resonance Theory Neural Networks - Support Vector

w.E
Machines - Spike Neuron Models.

 a syE
2.1 Back Propagation Neural Networks
 The Backpropagation algorithm looks for the minimum value of the error function in

ngi
weight space using a technique called the delta rule or gradient descent. The weights that
minimize the error function is then considered to be a solution to the learning problem.

nee
 Back propagation is a systematic method for training multiple layer ANN. It is a
generalization of Widrow-Hoff error correction rule. 80% of ANN applications uses back
propagation.
 Fig. 2.1.1 shows backpropagation network. rin
g.n
et

Fig. 2.1.1 : Backpropagation network


 Consider a simple neuron :
a. Neuron has a summing junction and activation function.
b. Any non linear function which differentiable every where and increases everywhere
with sum canDownloaded
be used asbyactivation
M RAMESHfunction.
KUMAR (mrameshmail4pec@gmail.com)

(2 - 1)
lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 2) Artificial Neural Networks

c. Examples: Logistic function, Arc tangent function, Hyperbolic tangent activation


function.
 These activation function makes the multilayer network to have greater representational
power than single layer network only when non-linearity is introduced.
 The input to the activation function is sum which is defined by the following equation
n
sum = I1 W1 + I2 W2 + ….. + In Wn =  Ij Wj + b
j=1

 Activation Function : Logistic Function


1
f(sum) =
ww (1 + e
– s*sum

= (1 + e
)
– s*sum – 1
)

w.E
a syE
ngi
nee
rin
 Logistic function monotonically increases from a lower limit (0 or - 1) to an upper limit

g.n
(+1) as sum increases. In which values vary between 0 and 1, with a value of 0.5 when I is
zero.
 Activation Function: Arc Tangent
2 –1
f(sum) = tan (s * sum)

et

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 3) Artificial Neural Networks

 Activation Function : Hyperbolic Tangent


f(sum) = tanh (s * I)
s*sum – s*sum
e –e
= s*sum – s*sum
e +e

ww
w.E
a
 Need of hidden layers : syE
ngi
1. A network with only two layers (input and output) can only represent the input with
whatever representation already exists in the input data.

nee
2. If the data’s are discontinuous or non-linearly separable, the innate representation is
inconsistent, and the mapping cannot be learned using two layers(Input & Output).
3. Therefore, hidden layer (s) are used between input and output layers
rin
g.n
 Weights connects unit(neuron) in one layer only to those in the next higher layer. The
output of the unit is scaled by the value of the connecting weight, and it is fed forward to
provide a portion of the activation for the units in the next higher layer.
 Back propagation can be applied to an artificial neural network with any number of hidden
layers. The training objective is to adjust the weights so that the application of a set of
et
inputs produces the desired outputs.
 Training procedure : The network is usually trained with a large number of input-output
pairs.
1. Generate weights randomly to small random values (both positive and negative) to
ensure that the network is not saturated by large values of weights.
2. Choose a training pair from the training set.
3. Apply the input vector to network input.
4. Calculate the network output.
5. Calculate the error, the difference between the network output and the desired output.
6. Adjust the weights of the network in a way that minimizes this error.
7. Repeat steps 2-6 for each pair of input-output in the training set until the error for the
entire systemDownloaded
is acceptably
by M low.
RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 4) Artificial Neural Networks

 Forward pass and backward pass :


 Back propagation neural network training involves two passes.
1. In the forward pass, the input signals moves forward from the network input to the
output.
2. In the backward pass, the calculated error signals propagate backward through the
network, where they are used to adjust the weights.
3. In the forward pass, the calculation of the output is carried out, layer by layer, in the
forward direction. The output of one layer is the input to the next layer.
 In the reverse pass,

ww
a. The weights of the output neuron layer are adjusted first since the target value of each
output neuron is available to guide the adjustment of the associated weights, using the
delta rule.
w.E
b. Next, we adjust the weights of the middle layers. As the middle layer neurons have no
target values, it makes the problem complex.

a syE
 Selection of number of hidden units : The number of hidden units depends on the number
of input units.

ngi
1. Never choose h to be more than twice the number of input units.
2. You can load p patterns of I elements into log2 p hidden units.

nee
3. Ensure that we must have at least 1/e times as many training examples.
4. Feature extraction requires fewer hidden units than inputs.
rin
5. Learning many examples of disjointed inputs requires more hidden units than inputs.

g.n
6. The number of hidden units required for a classification task increases with the number
of classes in the task. Large networks require longer training times
 Factors influencing back propagation training
 The training time can be reduced by using
et
1. Bias : Networks with biases can represent relationships between inputs and outputs
more easily than networks without biases. Adding a bias to each neuron is usually
desirable to offset the origin of the activation function. The weight of the bias is
trainable similar to weight except that the input is always +1.
2. Momentum : The use of momentum enhances the stability of the training process.
Momentum is used to keep the training process going in the same general direction
analogous to the way that momentum of a moving object behaves. In back propagation
with momentum, the weight change is a combination of the current gradient and the
previous gradient
 Advantages of backpropagation:
1. It is simple, fast and easy to program
2. Only numbers of the input are tuned and not any other parameter
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 5) Artificial Neural Networks

3. No need to have prior knowledge about the network


4. It is flexible
5. A standard approach and works efficiently
6. It does not require the user to learn special functions
 Disadvantages of backpropagation:
1. Backpropagation possibly be sensitive to noisy data and irregularity
2. The performance of this is highly reliant on the input data
3. Needs excessive time for training

 ww
4. The need for a matrix-based method for backpropagation instead of mini-batch

2.2 Kohonen Neural Network


w.E
 Kohonen self organizing networks are also called Kohonen features maps or topology
preserving maps are used to solve competition based network paradigm for data
clustering.
a syE
 The Kohonen model provides a topological mapping. It places a fixed number of input
patterns from the input layer into a higher-dimensional output or Kohonen layer.

ngi
 Training in the Kohonen network begins with the winner's neighbourhood of a fairly large
size. Then, as training proceeds, the neighbourhood size gradually decreases.

nee
 Fig. 2.2.1 shows a simple Kohonen self organizing network with 2 inputs and 49 outputs.
The learning feature map is similar to that of competitive learning networks.

rin
g.n
et

Fig. 2.2.1 : Simple Kohonen self organizing network


 A similarity measure is selected and the winning unit is considered to be the one with the
largest activation. For this Kohonen features maps all the weights in a neighborhood
around the winning units are also updated. The neighborhood's size generally decreases
slowly with eachDownloaded
iteration.by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
TECHNICAL PUBLICATIONS ® - An up thrust for knowledge
lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 6) Artificial Neural Networks

 Step for how to train a Kohonen self organizing network is as follows :


For n-dimensional input space and m output neurons :
1. Choose random weight vector wi for neuron i, i = 1, ..., m
2. Choose random input x
3. Determine winner neuron k : | | wk – x | | = mini | | wi – x | | (Euclidean distance)
4. Update all weight vectors of all neurons i in the neighborhood of neuron
(k : wi : = wi +    (i,k)  (x – wi)) (wi is shifted towards x)
5. If convergence criterion met, STOP. Otherwise, narrow neighborhood function and
learning parameter  and go to (2).
 Competitive learning in the Kohonen network
ww
 To illustrate competitive learning, consider the Kohonen network with 100 neurons
arranged in the form of a two-dimensional lattice with 10 rows and 10 columns. The
w.E
network is required to classify two-dimensional input vectors - each neuron in the network
should respond only to the input vectors occurring in its region.

a
 The network is trained with 1000 two-dimensional input vectors generated randomly in a
syE
square region in the interval between -1 and +1. The learning rate parameter a is equal to
0.1.
 1. Initial random weights
ngi
nee
rin
g.n
et

Fig. 2.2.2
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 7) Artificial Neural Networks

 2. Network after 10,000 iterations

ww
w.E
a syE
ngi
nee
Fig. 2.2.3
rin
 2.3 Learning Vector Quantization
g.n
 Learning Vector Quantization (LVQ) is adaptive data classification method. It is based on
training data with desired class information.
 LVQ uses unsupervised data clustering techniques to preprocesses the data set and obtain
cluster centers.
et
 Fig. 2.3.1 shows the network representation of LVQ.
 Here input dimension is 2 and the input space is divided into six clusters. The first two
clusters belong to class 1, while other four clusters belong to class 2.
 THE LVQ learning algorithm involves two steps :
1. An unsupervised learning data clustering method is used to locate several cluster centers
without using the class information.
2. The class information is used to fine tune the cluster centers to minimize the number of
misclassified cases.
 The number of cluster can either be specified a priori or determined via a cluster technique
capable of adaptively adding new clusters when necessary. Once the clusters are obtained,
their classes must be labeled before moving to second step. Such labeling is a achieved by
voting method.Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
TECHNICAL PUBLICATIONS ® - An up thrust for knowledge
lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 8) Artificial Neural Networks

ww Fig. 2.3.1 : LVQ

Learning method :w.E


a
 The weight vector (w) that is closest to the input vector (x) must be found. If x belongs to

syE
the same class, we move w towards x; otherwise we move w away from the input vector x.
 Step 1 : Initialize the cluster centers by a clustering method.

ngi
 Step 2 : Label each cluster by the voting method.
 Step 3 : Randomly select a training input vector x and find k such that | | x – wk | | is a
minimum.
 Step 4 : If x and wk belongs to the same class, update wk by
nee
 Wk = N (x – Wk) rin
Otherwise update wk by
g.n
 Wk = –  (X – Wk)
 Step 5 : If the maximum number of iterations is reached, stop. Otherwise return to
step 3.
et
 2.4 Hamming Neural Network
 Lippmann modelled a two layer bipolar network called Hamming neural network. The first
layer is the Hamming net and the second layer is the MAXNET.
 The first layer is a feed forward type network which classifies the input patterns based on
minimum Hamming distance. The Hamming distance (HD) between any two vectors is the
number of components in which the vectors differ.
 The Hamming net uses MAXNET in the second layer as a subnet to find the unit with the
largest net input.
 The second layer operates as recurrent recall network which suppresses all the outputs
except the initially obtained maximum output of the first layer.
 Fig. 2.4.1 showsDownloaded
structure of hamming
by M neural network.
RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 9) Artificial Neural Networks

Fig. 2.4.1 : Structure of hamming neural network


ww
 It can be divided into two basic sections:

w.E
1. Input layer: a layer built with neurons, all of those neurons are connected to all of the
network inputs;
2. Output layer, which is called MaxNet layer; the output of each neuron of this layer is
a syE
connected to input of each neuron of this layer, besides every neuron of this layer is
connected to exactly one neuron of the input layer.

ngi
 Input layer neurons are programmed to identify a fixed number of patterns; the number of
neurons in this layer matches the number of those patterns (M neurons – M patterns).

signal to a given pattern.


nee
Outputs of these neurons realise the function, which “measures” the similarity of an input

 The output layer is responsible for choosing the pattern, which is the most similar to testing
rin
signal. In this layer, the neuron with the strongest response stops the other neurons
responses.
g.n
 Let I (1-11111) and S (11-1-111) be the two fixed length bipolar vectors. Hamming
distance HD (I, S) is equal to 3.
t
 The scalar product of A and B is I S= [n – HD (I, S)] – HD (I, S)
et
 If n is the number of components in the vectors, then [n-HD (I, S)] are the number of
components in which the vectors agree.
t
I S = n – 2HD (I, S)
 Let I be the input vector and S be the vector that represents the patterns placed on a cluster.
For a two layer classifier of bipolar vector, the strongest response of a neuron indicates that
the minimum HD exists between the two vectors I and S.
t S n
 For setting up the weights and bias, the above equation is written as: HD (I, S) = I  +
2 2
 If the weights are fixed to one half of the standard vector S/2 and bias to n/2, then the
network will be able to find the input vector I, closest to the standard vector S. This is done
by finding the output unit with the largest net input.

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 10) Artificial Neural Networks

 2.5 Associative Memory


 One of the primary functions of the brain is associative memory. Learning can be
considered as a process of forming associations between related patterns. The associative
memory is composed of a cluster of units which represent a simple model of a real
biological neuron.
 An associative memory, also known as Content-Addressable Memory (CAM) can be
searched for a value in a single memory cycle rather than using a software loop.
 Associative memories can be implemented using networks with or without feedback. Such
associative neural networks are used to associate one set of vectors with another set of
vectors, say input and output patterns.

ww
 The aim of an associative memory is, to produce the associated output pattern whenever
one of the input patterns is applied to the neural network. The input pattern may be applied

w.E
to the network either as input or as initial state and the output pattern is observed at the
outputs of some neurons constituting the network.
 Associative memories belong to class of neural network that learn according to a certain
a syE
recording algorithm. They require information a priori and their connectivity matrices most
often need to be formed in advance. Writing into memory produces changes in the neural
interconnections. Reading of the stored info from memory named recall, is a transformation
of input signals by the network.
ngi
 All memory information is spatially distributed throughout the network. Associative

nee
memory enables a parallel search within a stored data. The purpose of search is to output
one or all stored items that matches the search argument and retrieve it entirely or partially.
 The Fig. 2.5.1 shows a block diagram of an associative memory.
rin
g.n
et
Fig. 2.5.1 : Block diagram of an associative memory
 In the initialization phase of the associative memory no information is stored; ? because the
information is represented in the w weights they are all set to zero.
 The advantage of neural associative memories over other pattern storage algorithms like
lookup tables of hash codes is that the memory access can be fault tolerant with respect to
variation of the input pattern.
 In associative memories many associations can be stored at the same time. There are
different schemes of superposition of the memory traces formed by the different
associations. The superposition can be simple linear addition of the synaptic changes
required for each association (like in the Hopfield model) or nonlinear.

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 11) Artificial Neural Networks

 The performance of neural associative memories is usually measured by a quantity called


information capacity, that is, the information content that can be learned and retrieved,
divided by the number of synapses required.
 An associative memory is a content-addressable structure that maps specific input
representations to specific output representations. It is a system that “associates” two
patterns (X, Y) such that when one is encountered, the other can be recalled.
 Associative network memory can be static or dynamic.
 Static : networks recall an output response after an input has been applied in one feed-
forward pass and theoretically without delay. They were termed instantaneous.
 Dynamic : memory networks produce recall as a result of output/input feedback interaction,

ww
which requires time.
 There are two classes of associative memory : auto-associative and hetero-associative.

w.E
 Whether auto- or hetero-associative, the net can associate not only the exact pattern pairs
used in training, but is also able to obtain associations if the input is similar to one on which
it has been trained.

a
 2.5.1 Auto-associative Memory
syE
 Auto-associative networks are a special subset of the hetero-associative networks, in which
i

ngi
i
each vector is associated with itself, i.e. y = x for i=1, ..., m. The function of such networks
is to correct noisy input vectors.
 Fig. 2.5.2 shows auto-associative memory.
nee
 Auto-associative memories are content based memories which can recall a stored sequence
rin
when they are presented with a fragment or a noisy version of it. They are very effective in
de-noising the input or removing interference from the input which makes them a
promising first step in solving the cocktail party problem.
g.n
 The simplest version of auto-associative memory is linear associator which is a two-layer
feed-forward fully connected neural network where the output is constructed in a single
feed-forward computation.
et

Fig. 2.5.2 : Auto-associative memory


 Artificial neural networks can be used as associative memories. One of the simplest
artificial neural associative memory is the linear associator. The Hopfield model and
Bidirectional Associative Memory (BAM) models are some of the other popular artificial
neural network models used as associative memories.
 2.5.2 Hetero-associative Memory Network
1 2 m
 Hetero-associative networks map “m” input vectors X , X ,..., X in n-dimensional space
1 2 m i i
to m output vectors y , y , ... ,y in k-dimensional space, so that X -> y .
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 12) Artificial Neural Networks

~
 If || X – X || <  then ~
i 2 i
x – y . This should be achieved by the learning algorithm, but
becomes very hard when the number m of vectors to be learned is too high.
 Fig. 2.5.3 shows block diagram of hetero-associative network.

Fig. 2.5.3 : Auto-associative memory


 Fig. 2.5.4 shows the structure of a hetero-associative network without feedback.

ww
w.E
a syE
ngi
nee
Fig. 2.5.4 : Hetero-associative network without feedback

rin
g.n
et

Fig. 2.5.5 : Hetero-associative network without feedback

 2.5.3 The Hopfield Network


 The Hopfield model is a single-layered recurrent network. Like the associative memory, it
is usually initialized with appropriate weights instead of being trained.
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 13) Artificial Neural Networks

 Hopfield Neural Network (HNN) is a model of auto-associative memory. It is a single layer


neural network with feedbacks. Fig. 2.5.6 shows Hopfield network of three units. The
Hopfield network is created by supplying input data vectors, or pattern vectors,
corresponding to the different classes. These patterns are called class patterns.

ww Fig. 2.5.6 : Hopfield network of three units


 Hopfield model consists of a single layer of processing elements where each unit is

w.E
connected to every other unit in the network other than itself.
 The output of each neuron is a binary number in {–1,1}. The output vector is the state

a
vector. Starting from an initial state (given as the input vector), the state of the network

syE
changes from one to another like an automaton. If the state converges, the point to which it
converges is called the attractor.

ngi
 In its simplest form, the output function is the sign function, which yields 1 for arguments
 0 and –1 otherwise.

nee
 The connection weight matrix W of this type of network is square and symmetric. The units
in the Hopfield model act as both input and output units.

rin
 A Hopfield network consists of “n” totally coupled units. Each unit is connected to all other
units except itself. The network is symmetric because the weight wij for the connection

g.n
between unit i and unit j is equal to the weight wij of the connection from unit j to unit i.
The absence of a connection from each unit to itself avoids a permanent feedback of its
own state value.
 Hopfield networks are typically used for classification problems with binary pattern
vectors.
et
 Hopfield model is classified into two categories :
1. Discrete Hopfield Model
2. Continuous Hopfield Model
 In both discrete and continuous Hopfield network weights trained in a one-shot fashion and
not trained incrementally as was done in case of Perceptron and MLP.
 In the discrete Hopfield model, the units use a slightly modified bipolar output function
where the states of the units, i.e., the output of the units remain the same if the current state
is equal to some threshold value.
 The continuous Hopfield model is just a generalization of the discrete case. Here, the units
use a continuous output function such as the sigmoid or hyperbolic tangent function. In the
continuous Hopfield model, each unit has an associated capacitor Ci and resistance ri that
model the capacitance andby
Downloaded resistance
M RAMESH ofKUMAR
real neuron's cell membrane, respectively.
(mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 14) Artificial Neural Networks

 2.5.4 Bidirectional Associative Memory (BAM)


 BAM consists of two layers, x and y. Signals are sent back and forth between both layers
until an equilibrium is reached. Equilibrium is reached if the x and y vectors no longer
change. Given an x vector the BAM is able to produce the y vector and vice versa.
 BAM consists of bi-directional edges so that information can flow in either direction. Since
the BAM network has bidirectional edges, propagation moves in both directions, first from
one layer to another, and then back to the first layer. Propagation continues until the nodes
are no longer changing values.
 Fig. 2.5.7 shows BAM network.
 Since the BAM also uses the traditional Hebb's learning rule to build the connection weight
ww
matrix to store the associated pattern pairs, it too has a severely low memory capacity.

w.E
a syE
ngi
Fig. 2.5.7 : BAM network
 BAM can be classified into two categories: nee
rin
1. Discrete BAM : The network propagates an input pattern X to the Y layer where the
units in the Y layer will compute their net input.
g.n
2. Continuous BAM : The units use the sigmoid or hyperbolic tangent output function.
The units in the X layer have an extra external input Ii, while the units in the Y layer
have an extra external input Jj for i = 1, 2, ..., m and j = 1, 2, ..., n.
These extra external inputs lead to a modification in the computation of the net input to the
et
units.

 2.5.5 Difference Between Auto-associative Memory and


Hetero-associative Memory
Auto-associative memory Hetero-associative memory
The inputs and output vectors s and t are The inputs and output vectors s and t are
the same different.
Recalls a memory of the same modality as Recalls a memory that is different in character
the one that evoked it from the input
A picture of a favorite object might evoke A particular smell or sound, for example,
a mental image of that object in vivid might evoke a visual memory of some past
detail event
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 15) Artificial Neural Networks

Auto-associative memory Hetero-associative memory


An auto-associative memory retrieves the Hetero-associative memory retrieves the
same pattern stored pattern
Example : color correction, color Example : 1. Space transforms : Fourier,
constancy 2. Dimensionality reduction : PCA

 2.6 Adaptive Resonance Theory Neural Networks


 Gail Carpenter and Stephen Grossberg (Boston University) developed the Adaptive
Resonance learning model. How can a system retain its previously learned knowledge

ww
while incorporating new information.
 Adaptive resonance architectures are artificial neural networks that are capable of stable

w.E
categorization of an arbitrary sequence of unlabeled input patterns in real time. These
architectures are capable of continuous training with non-stationary inputs.
 Some models of Adaptive Resonance Theory are :
a
1. ART1 - Discrete input.
2. ART2 - Continuous input. syE
ngi
3. ARTMAP - Using two input vectors, transforms the unsupervised ART model into a
supervised one.
 Various others : Fuzzy ART, Fuzzy ARTMAP (FARTMAP), etc…
nee
 The primary intuition behind the ART model is that object identification and recognition
rin
generally occur as a result of the interaction of 'top-down' observer expectations with
'bottom-up' sensory information.
g.n
 The basic ART system is an unsupervised learning model. It typically consists of a
comparison field and a recognition field composed of neurons, a vigilance parameter, and a
reset module. However, ART networks are able to grow additional neurons if a new input
cannot be categorized appropriately with the existing neurons.
et
 ART networks tackle the stability-plasticity dilemma :
1. Plasticity : They can always adapt to unknown inputs if the given input cannot be
classified by existing clusters.
2. Stability : Existing clusters are not deleted by the introduction of new inputs.
3. Problem : Clusters are of fixed size, depending on .
 Fig. 2.6.1 shows ART-1 Network.
 ART-1 networks, which receive binary input vectors. Bottom-up weights are used to
determine output-layer candidates that may best match the current input.
 Top-down weights represent the “prototype” for the cluster defined by each output neuron.
A close match between input and prototype is necessary for categorizing the input.

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 16) Artificial Neural Networks

Fig. 2.6.1 : ART 1 network


 Finding this match can require multiple signal exchanges between the two layers in both
ww
directions until “resonance” is established or a new neuron is added.

w.E
 The basic ART model, ART1, is comprised of the following components :
1. The short term memory layer : F1 - Short term memory.

a
2. The recognition layer : F2 - Contains the long term memory of the system.

syE
3. Vigilance Parameter :  - A parameter that controls the generality of the memory. Larger
 means more detailed memories, smaller  produces more general memories.
 Types of ART : ngi
ART 1
Type
nee
Remarks
It is the simplest variety of ART networks, accepting only binary
inputs.
rin
ART 2
ART 3
Extends network capabilities to support continuous inputs.
g.n
ART 3 builds on ART-2 by simulating rudimentary neurotransmitter
regulation of synaptic activity by incorporating simulated sodium (Na+)
and calcium (Ca2+) ion concentrations into the system’s equations,
which results in a more physiologically realistic means of partially
et
inhibiting categories that trigger mismatch resets.
Fuzzy ART Fuzzy ART implements fuzzy logic into ART’s pattern recognition,
thus enhancing generalizability
ARTMAP It is also known as Predictive ART, combines two slightly modified
ART-1 or ART-2 units into a supervised learning structure where the
first unit takes the input data and the second unit takes the correct
output data, then used to make the minimum possible adjustment of the
vigilance parameter in the first unit in order to make the correct
classification.
Fuzzy ARTMAP Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting
in a corresponding increase in efficiency.

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 17) Artificial Neural Networks

 2.7 Support Vector Machines


 Support Vector Machines (SVMs) are a set of supervised learning methods which learn
from the dataset and used for classification. SVM is a classifier derived from statistical
learning theory by Vapnik and Chervonenkis.
 An SVM is a kind of large-margin classifier: it is a vector space based machine learning
method where the goal is to find a decision boundary between two classes that is maximally
far from any point in the training data.
 Given a set of training examples, each marked as belonging to one of two classes, an SVM
algorithm builds a model that predicts whether a new example falls into one class or the
other. Simply speaking, we can think of an SVM model as representing the examples as

ww
points in space, mapped so that each of the examples of the separate classes are divided by
a gap that is as wide as possible.

w.E
 New examples are then mapped into the same space and classified to belong to the class
based on which side of the gap they fall on.
 Two class problems
a syE
 Many decision boundaries can separate these two classes. Which one should we choose ?

ngi
 Perceptron learning rule can be used to find any decision boundary between class 1 and
class 2.

nee
 The line that maximizes the minimum margin is a good bet. The model class of “hyper-
planes with a margin of m” has a low VC dimension if m is big.

rin
 This maximum-margin separator is determined by a subset of the data points. Data points in
this subset are called “support vectors”. It will be useful computationally if only a small

g.n
fraction of the data points are support vectors, because we use the support vectors to decide
which side of the separator a test case is on.

et

Fig. 2.7.1

 Example of bad decision boundaries


 SVM are primarily two-class classifiers with the distinct characteristic that they aim to find
the optimal hyperplane such that the expected generalization error is minimized. Instead of
directly minimizing the empirical risk calculated from the training data, SVMs perform
structural risk minimization to achieve good generalization.
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 18) Artificial Neural Networks

Fig. 2.7.2 : Bad decision boundary of SVM

ww
w.E
a syE
ngi
Fig. 2.7.3 : Empirical risk
nee
 The empirical risk is the average loss of an estimator for a finite set of data drawn from P.
rin
The idea of risk minimization is not only measure the performance of an estimator by its
risk, but to actually search for the estimator that minimizes risk over distribution P. Because
g.n
we don't know distribution P we instead minimize empirical risk over a training dataset
drawn from P. This general learning technique is called empirical risk minimization.
 Fig. 2.7.3 shows empirical risk.
 Good decision boundary : margin should be large
et
 The decision boundary should be as far away from the data of both classes as possible. If
data points lie very close to the boundary, the classifier may be consistent but is more
“likely” to make errors on new instances from the distribution. Hence, we prefer classifiers
that maximize the minimal distance of data points to the separator.
1. Margin (m) : The gap between data points & the classifier boundary. The Margin is the
minimum distance of any sample to the decision boundary. If this hyperplane is in the
canonical form, the margin can be measured by the length of the weight vector. The
margin is given by the projection of the distance between these two points on the
direction perpendicular to the hyperplane. Margin of the separator is the distance
between support vectors.
2
Margin (m) = || w ||
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 19) Artificial Neural Networks

2. Maximal margin classifier : A classifier in the family F that maximizes the margin.
Maximizing the margin is good according to intuition and PAC theory. Implies that only
support vectors matter; other training examples are ignorable.

ww
w.E
a syE
Fig. 2.7.4 : Good decision boundary

ngi
Example 2.7.1 : For the following Fig. 2.7.5 find a linear hyperplane (decision boundary) that
will separate the data.

nee
rin
g.n
et
Fig. 2.7.5
 Solution :
1. Define what an optimal hyperplane is : maximize margin
2. Extend the above definition for non-linearly separable problems : have a penalty term
for misclassifications
3. Map data to high dimensional space where it is easier to classify with linear decision
surfaces : reformulate problem so that data is mapped implicitly to this space

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 20) Artificial Neural Networks

ww
w.E
a syE
ngi
nee
rin
g.n
Fig. 2.7.6
et
 2.7.1 Key Properties of Support Vector Machines
1. Use a single hyperplane which subdivides the space into two half-spaces, one which is
occupied by Class 1 and the other by Class 2
2. They maximize the margin of the decision boundary using quadratic optimization
techniques which find the optimal hyperplane.
3. Ability to handle large feature spaces.
4. Overfitting can be controlled by soft margin approach
5. When used in practice, SVM approaches frequently map the examples to a higher
dimensional space and find margin maximal hyperplanes in the mapped space, obtaining
decision boundaries which are not hyperplanes in the original space.
6. The most popular versions of SVMs use non-linear kernel functions and map the attribute
space into a higher dimensional space to facilitate finding “good” linear decision
boundaries in the modifiedbyspace.
Downloaded M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 21) Artificial Neural Networks

 2.7.2 SVM Applications


 SVM has been used successfully in many real-world problems
1. text (and hypertext) categorization
2. image classification
3. bioinformatics (Protein classification, Cancer classification)
4. hand-written character recognition
5. Determination of SPAM email

 2.7.3 Limitations of SVM


ww
1. It is sensitive to noise.
2. The biggest limitation of SVM lies in the choice of the kernel.

w.E
3. Another limitation is speed and size.
4. The optimal design for multiclass SVM classifiers is also a research area.

a
 2.7.4 Soft Margin SVM syE
 For the very high dimensional problems common in text classification, sometimes the data
ngi
are linearly separable. But in the general case they are not, and even if they are, we might
prefer a solution that better separates the bulk of the data while ignoring a few weird noise
documents.
nee
 What if the training set is not linearly separable ? Slack variables can be added to allow

rin
misclassification of difficult or noisy examples, resulting margin called soft.

g.n
 A soft-margin allows a few variables to cross into the margin or over the hyperplane,
allowing misclassification.
 We penalize the crossover by looking at the number and distance of the misclassifications.
This is a trade off between the hyperplane violations and the margin size. The slack
variables are bounded by some set cost. The farther they are from the soft margin, the less
et
influence they have on the prediction.
 All observations have an associated slack variable
1. Slack variable = 0 then all points on the margin.
2. Slack variable > 0 then a point in the margin or on the wrong side of the hyperplane
3. C is the tradeoff between the slack variable penalty and the margin
 2.7.5 Comparison of SVM and Neural Networks
Support Vector Machine Neural Network
Kernel maps to a very-high dimensional Hidden Layers map to lower dimensional
space spaces
Search space has a unique minimum Search space has multiple local minima
Classification extremely efficient Classification extremely efficient
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 22) Artificial Neural Networks

Support Vector Machine Neural Network


Very good accuracy in typical domains Very good accuracy in typical domains
Kernel and cost the two parameters to Requires number of hidden units and layers
select
Training is extremely efficient Training is expensive

 2.8 Spike Neuron Models


 Spiking neural networks (SNN) represent a special class of artificial neural networks, where
neuron models communicate by sequences of spikes.
 Networks composed of spiking neurons are able to process substantial amount of data using
ww
a relatively small number of spikes. Due to their functional similarity to biological neurons,
spiking models provide powerful tools for analysis of elementary processes in the brain,

w.E
including neural information processing, plasticity and learning.
 At the same time spiking networks offer solutions to a broad range of specific problems in

a
applied engineering, such as fast signal-processing, event detection, classification, speech

syE
recognition, spatial navigation or motor control.
 Biological neurons communicate by generating and propagating electrical pulses called

ngi
action potentials or spikes. This feature of real neurons became a central paradigm of a
theory of spiking neural models.

nee
 All spiking models share the following common properties with their biological
counterparts:

rin
1) They process information coming from many inputs and produce single spiking output
signals;

g.n
2) Their probability of firing (generating a spike) is increased by excitatory inputs and
decreased by inhibitory inputs;
3) Their dynamics is characterized by at least one state variable; when the internal
variables of the model reach a certain state, the model is supposed to generate one or
more spikes.
et
 The most common spiking neuron models are Integrate-and-Fire (IF) and Leaky-
Integrateand-Fire (LIF) units. Both models treat biological neurons as point dynamical
systems.
 Spiking network topologies can be classified into three general categories:
1. Feedforward networks : – This is where the data flow from input to output units is
strictly one-directional; the data processing can extend over multiple layers of neurons,
but no feedback connections are present.
2. Recurrent networks :– Here individual neurons or populations of neurons interact
through reciprocal (feedback) connections. Feedback connections result in an internal
state of the network which allows it to exhibit dynamic temporal behavior.
3. Hybrid networks :– This group encompasses networks in which some subpopulations
may be strictly feedforward, while other have recurrent topologies. Interactions between
the subpopulations maybybeM one
Downloaded directional
RAMESH KUMAR or reciprocal.
(mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 23) Artificial Neural Networks

 2.9 Two Marks Questions with Answers


Q. 1 Define the term Hebbian Learning.
Ans. : According to Hebbian learning, when two neurons are simultaneously active, the
connection between them must be strengthened; when one of them is active, while the other is
inactive, the connection strength must be weakened
Q. 2 What is associative network ?
Ans. : An associative network is a single-layer net in which the weights are determined in
such a way that the net can store a set of pattern associations.
Q. 3 Define an interpolative.

ww
Ans. : If the input is not an exact match of an association input, then the output is altered
based on the distance from the input.

w.E
Q. 4 What is recall ?
Ans. : If the input vectors are uncorrelated, the Hebb rule will produce the correct weights,

a
and the response of the net when tested with one of the training vectors will be perfect recall.
Q. 5 What do you mean cross talk ?
syE
Ans. : If the input vectors are not orthogonal, the response will include a portion of each of
their target values.
This is commonly called cross talk ngi
Q. 6 Explain spurious stable state. nee
rin
Ans. : If the input vector is an “unknown” vector, the activation vectors produced as the net
iterates will converge to an activation vector that is not one of the stored patterns; such a
pattern is called a spurious stable state.
Q. 7 What is back propagation algorithm ? g.n
Ans. : The back-prop algorithm is an iterative gradient algorithm designed to minimize the
mean square error between the actual output of a multilayer feed-forward perceptron and the
desired output. It requires continuous differentiable non-linearities.
et
Q. 8 What is winner takes all ?
Ans. : In competitive learning, neurons compete among themselves to be activated. While in
Hebbian learning, several output neurons can be activated simultaneously, in competitive
learning, only a single output neuron is active at any time. The output neuron that wins the
“competition” is called the winner-takes-all neuron
Q. 9 Explain Learning Vector Quantization.
Ans. : LVQ is adaptive data classification method. It is based on training data with desired
class information. LVQ uses unsupervised data clustering techniques to preprocesses the data
set and obtain cluster centers.

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 24) Artificial Neural Networks

Q. 10 Which are the rules used in Hebb's law ?


Ans. : Rules :
1. If two neurons on either side of a connection are activated synchronously, then the
weight of that connection is increased.
2. If two neurons on either side of a connection are activated asynchronously, then the
weight of that connection is decreased.
Q. 11 What do you mean counter propagation network ?
Ans. : Counter propagation networks multilayer networks based on a combination of input,
clustering and output layers. This network can be used to compress data, to approximate
functions or to associate patterns.
ww
Q. 12 What is Hopfield model ?
Ans. :
w.E
The Hopfield model is a single-layered recurrent network. Like the associative
memory, it is usually initialized with appropriate weights instead of being trained.

a
Q. 13 What is McCulloch Pitts model ?

syE
Ans. : McCulloch and Pitts describe a neuron as a logical threshold element with two possible
states. Such a threshold element has “N” input channels and one output channel. An input

ngi
channel is either active (input 1) or silent (input 0). Directed weight graph is used for
connecting neurons.
Q. 14 What are the problems with McCulloch-Pitts neurons ?
Ans. : The problems with McCulloch-Pitts neurons are : nee
1. Weights and thresholds are analytically determined. We cannot learn
rin
2. It is very difficult to minimize size of a network.
Q. 15 Describe the term Perceptron. g.n
Ans. : An arrangement of one input layer of McCulloch-Pitts neurons feeding forward to one
output layer of McCulloch-Pitts neurons is known as a Perceptron. et
 2.10 Multiple Choice Questions with Answers
Q. 1 Backpropagation is a supervised learning algorithm, for training ________ perceptrons.
(a) single layer (b) multilayer (c) any form of (d) none
Ans. : (b) multilayer
Q. 2 Learning Vector Quantization network is a ________ neural network where the input
vectors are trained for a specific class or group already mapped in the training set.
(a) supervised (b) unsupervised (c) semisupervised (d) none
Ans. : (a) supervised
Q. 3 Hopfield neural networks is an example of neural networks.
(a) supervised (b) hybrid (c) associative memory (d) none
Ans. : (c) associative memory
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 25) Artificial Neural Networks

Q. 4 What is backpropagation ?
(a) It is another name given to the curvy function in the perceptron.
(b) It is the transmission of error back through the network to adjust the inputs.
(c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
(d) None of the above.
Ans. :(c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
Q. 5 The BAM (Bidirectional Associative Memory) significantly belongs to _______.

ww(a) auto associative


(c) genetic associative
(b)
(d)
fuzzy associative
hetero associative

w.E
Ans. : (d) hetero associative
Q. 6 The weights of the BAM are initialized based on the ________ rule.
(a) Hebb
Ans. : (a) Hebb
a
(b) ART
syE
(c) ANN (d) all of these

Q. 7 What is shape of dendrites like ________.


(a) oval (b) round (c) tree ngi
(d) rectangular
Ans. : (c) tree
Q. 8 What is the objective of BAM?
nee
(a) To store pattern pairs. rin
(b) To recall pattern pairs.
g.n
(c) To store a set of pattern pairs and they can be recalled by giving either of pattern as
input.
(d) None of the mentioned.
et
Ans. :(c) To store a set of pattern pairs and they can be recalled by giving either of
pattern as input.
Q. 9 Hetero-associative memory is also known as?
(a) Unidirectional memory. (b) Bidirectional memory.
(c) Multidirectional associative memory. (d) Temporal associative memory.
Ans. : (b) Bidirectional memory.
Q. 10 What are the general tasks that are performed with backpropagation algorithm?
(a) Pattern mapping (b) Function approximation
(c) Prediction (d) All of the above
Ans. : (d) All of the above

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 26) Artificial Neural Networks

Q. 11 Adaline which stands for ________.


(a) Adaptive Linear Neuron (b) Address Linear Neuron
(c) Adaptive Linear Network (d) Adaptive Neural Neuron
Ans. : (a) Adaptive Linear Neuron
Q. 12 BAM stands for ________.
(a) Bidirectional Adaptive Memory
(b) Backpropagation Associative Memory
(c) Bidirectional Associative Memory
(d) Bidirectional Associative Machine

ww
Ans. : (c) Bidirectional Associative Memory
Q. 13 The Hopfield network consists of a set of neurons forming a multiple loop ________
system.
w.E
(a) unidirectional (b) parallel (c) feedback (d) feedforward
Ans. : (c) feedback
a syE
Q. 14 A ________ Hopfield net can be used to determine whether an input vector is a known
vector or an unknown vector.
(a) binary
Ans. : (a) binary
(b) discrete
ngi
(c) autoassociative (d) All

Q. 15 Kohonen network trained in an ________ mode.


nee
(a) supervised
Ans. : (b) unsupervised
(b) unsupervised (c) semi-supervised
rin (d) All of these

Q. 16 What is an activation value?


g.n
(a) Weighted sum of inputs
(c) Main input to neuron
Ans. : (a) Weighted sum of inputs
(b) Threshold value
(d) None of the mentioned et
Q. 17 What is hebb’s rule of learning ?
(a) The system learns from its past mistakes.
(b) The system recalls previous reference inputs & respective ideal outputs.
(c) The strength of neural connection get modified accordingly.
(d) None of the mentioned.
Ans. : (c) The strength of neural connection get modified accordingly.
Q. 18 Why can’t we design a perfect neural network?
(a) Full operation is still not known of biological neurons.
(b) Number of neuron is itself not precisely known.
(c) Number of interconnection is very large & is very complex.
(d) All of these.
Ans. : (d) All of these
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 27) Artificial Neural Networks

Q. 19 What was the main point of difference between the Adaline & perceptron model?
(a) Weights are compared with output.
(b) Sensory units result is compared with output.
(c) Analog activation value is compared with output.
(d) All of the mentioned.
Ans. : (c) Analog activation value is compared with output.
Q. 20 Heteroassociative memory can be an example of which type of network?
(a) Group of instars. (b) Group of outstar.
(c) Either group of instars or outstars. (d) Both group of instars or outstars.
ww
Ans. : (c) Either group of instars or outstars

w.E
Q. 21 Hidden units are composed of ________, computing an affine transformation and
element-wise nonlinear function.
(a) output vector (b) input vector

a
(c) activation function
Ans. : (b) input vector syE (d) all of these

(a) neural network


ngi
Q. 22 The backpropagation algorithm is used to find a local minimum of the ________.
(b) activation function
(c) error function (d) none of these nee
Ans. : (c) error function
rin 

g.n 

et

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge


lOMoARcPSD|57969650

Downloaded From: www.EasyEngineering.net


Soft Computing (2 - 28) Artificial Neural Networks

Notes

ww
w.E
a syE
ngi
nee
rin
g.n
et

Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)

TECHNICAL PUBLICATIONS ® - An up thrust for knowledge

You might also like