Soft Computing Unit 2
Soft Computing Unit 2
UNIT - II
ww
Quantization - Hamming Neural Network - Hopfield Neural Network - Bi-directional
Associative Memory - Adaptive Resonance Theory Neural Networks - Support Vector
w.E
Machines - Spike Neuron Models.
a syE
2.1 Back Propagation Neural Networks
The Backpropagation algorithm looks for the minimum value of the error function in
ngi
weight space using a technique called the delta rule or gradient descent. The weights that
minimize the error function is then considered to be a solution to the learning problem.
nee
Back propagation is a systematic method for training multiple layer ANN. It is a
generalization of Widrow-Hoff error correction rule. 80% of ANN applications uses back
propagation.
Fig. 2.1.1 shows backpropagation network. rin
g.n
et
(2 - 1)
lOMoARcPSD|57969650
= (1 + e
)
– s*sum – 1
)
w.E
a syE
ngi
nee
rin
Logistic function monotonically increases from a lower limit (0 or - 1) to an upper limit
g.n
(+1) as sum increases. In which values vary between 0 and 1, with a value of 0.5 when I is
zero.
Activation Function: Arc Tangent
2 –1
f(sum) = tan (s * sum)
et
ww
w.E
a
Need of hidden layers : syE
ngi
1. A network with only two layers (input and output) can only represent the input with
whatever representation already exists in the input data.
nee
2. If the data’s are discontinuous or non-linearly separable, the innate representation is
inconsistent, and the mapping cannot be learned using two layers(Input & Output).
3. Therefore, hidden layer (s) are used between input and output layers
rin
g.n
Weights connects unit(neuron) in one layer only to those in the next higher layer. The
output of the unit is scaled by the value of the connecting weight, and it is fed forward to
provide a portion of the activation for the units in the next higher layer.
Back propagation can be applied to an artificial neural network with any number of hidden
layers. The training objective is to adjust the weights so that the application of a set of
et
inputs produces the desired outputs.
Training procedure : The network is usually trained with a large number of input-output
pairs.
1. Generate weights randomly to small random values (both positive and negative) to
ensure that the network is not saturated by large values of weights.
2. Choose a training pair from the training set.
3. Apply the input vector to network input.
4. Calculate the network output.
5. Calculate the error, the difference between the network output and the desired output.
6. Adjust the weights of the network in a way that minimizes this error.
7. Repeat steps 2-6 for each pair of input-output in the training set until the error for the
entire systemDownloaded
is acceptably
by M low.
RAMESH KUMAR (mrameshmail4pec@gmail.com)
ww
a. The weights of the output neuron layer are adjusted first since the target value of each
output neuron is available to guide the adjustment of the associated weights, using the
delta rule.
w.E
b. Next, we adjust the weights of the middle layers. As the middle layer neurons have no
target values, it makes the problem complex.
a syE
Selection of number of hidden units : The number of hidden units depends on the number
of input units.
ngi
1. Never choose h to be more than twice the number of input units.
2. You can load p patterns of I elements into log2 p hidden units.
nee
3. Ensure that we must have at least 1/e times as many training examples.
4. Feature extraction requires fewer hidden units than inputs.
rin
5. Learning many examples of disjointed inputs requires more hidden units than inputs.
g.n
6. The number of hidden units required for a classification task increases with the number
of classes in the task. Large networks require longer training times
Factors influencing back propagation training
The training time can be reduced by using
et
1. Bias : Networks with biases can represent relationships between inputs and outputs
more easily than networks without biases. Adding a bias to each neuron is usually
desirable to offset the origin of the activation function. The weight of the bias is
trainable similar to weight except that the input is always +1.
2. Momentum : The use of momentum enhances the stability of the training process.
Momentum is used to keep the training process going in the same general direction
analogous to the way that momentum of a moving object behaves. In back propagation
with momentum, the weight change is a combination of the current gradient and the
previous gradient
Advantages of backpropagation:
1. It is simple, fast and easy to program
2. Only numbers of the input are tuned and not any other parameter
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
ww
4. The need for a matrix-based method for backpropagation instead of mini-batch
ngi
Training in the Kohonen network begins with the winner's neighbourhood of a fairly large
size. Then, as training proceeds, the neighbourhood size gradually decreases.
nee
Fig. 2.2.1 shows a simple Kohonen self organizing network with 2 inputs and 49 outputs.
The learning feature map is similar to that of competitive learning networks.
rin
g.n
et
a
The network is trained with 1000 two-dimensional input vectors generated randomly in a
syE
square region in the interval between -1 and +1. The learning rate parameter a is equal to
0.1.
1. Initial random weights
ngi
nee
rin
g.n
et
Fig. 2.2.2
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
ww
w.E
a syE
ngi
nee
Fig. 2.2.3
rin
2.3 Learning Vector Quantization
g.n
Learning Vector Quantization (LVQ) is adaptive data classification method. It is based on
training data with desired class information.
LVQ uses unsupervised data clustering techniques to preprocesses the data set and obtain
cluster centers.
et
Fig. 2.3.1 shows the network representation of LVQ.
Here input dimension is 2 and the input space is divided into six clusters. The first two
clusters belong to class 1, while other four clusters belong to class 2.
THE LVQ learning algorithm involves two steps :
1. An unsupervised learning data clustering method is used to locate several cluster centers
without using the class information.
2. The class information is used to fine tune the cluster centers to minimize the number of
misclassified cases.
The number of cluster can either be specified a priori or determined via a cluster technique
capable of adaptively adding new clusters when necessary. Once the clusters are obtained,
their classes must be labeled before moving to second step. Such labeling is a achieved by
voting method.Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
TECHNICAL PUBLICATIONS ® - An up thrust for knowledge
lOMoARcPSD|57969650
syE
the same class, we move w towards x; otherwise we move w away from the input vector x.
Step 1 : Initialize the cluster centers by a clustering method.
ngi
Step 2 : Label each cluster by the voting method.
Step 3 : Randomly select a training input vector x and find k such that | | x – wk | | is a
minimum.
Step 4 : If x and wk belongs to the same class, update wk by
nee
Wk = N (x – Wk) rin
Otherwise update wk by
g.n
Wk = – (X – Wk)
Step 5 : If the maximum number of iterations is reached, stop. Otherwise return to
step 3.
et
2.4 Hamming Neural Network
Lippmann modelled a two layer bipolar network called Hamming neural network. The first
layer is the Hamming net and the second layer is the MAXNET.
The first layer is a feed forward type network which classifies the input patterns based on
minimum Hamming distance. The Hamming distance (HD) between any two vectors is the
number of components in which the vectors differ.
The Hamming net uses MAXNET in the second layer as a subnet to find the unit with the
largest net input.
The second layer operates as recurrent recall network which suppresses all the outputs
except the initially obtained maximum output of the first layer.
Fig. 2.4.1 showsDownloaded
structure of hamming
by M neural network.
RAMESH KUMAR (mrameshmail4pec@gmail.com)
w.E
1. Input layer: a layer built with neurons, all of those neurons are connected to all of the
network inputs;
2. Output layer, which is called MaxNet layer; the output of each neuron of this layer is
a syE
connected to input of each neuron of this layer, besides every neuron of this layer is
connected to exactly one neuron of the input layer.
ngi
Input layer neurons are programmed to identify a fixed number of patterns; the number of
neurons in this layer matches the number of those patterns (M neurons – M patterns).
The output layer is responsible for choosing the pattern, which is the most similar to testing
rin
signal. In this layer, the neuron with the strongest response stops the other neurons
responses.
g.n
Let I (1-11111) and S (11-1-111) be the two fixed length bipolar vectors. Hamming
distance HD (I, S) is equal to 3.
t
The scalar product of A and B is I S= [n – HD (I, S)] – HD (I, S)
et
If n is the number of components in the vectors, then [n-HD (I, S)] are the number of
components in which the vectors agree.
t
I S = n – 2HD (I, S)
Let I be the input vector and S be the vector that represents the patterns placed on a cluster.
For a two layer classifier of bipolar vector, the strongest response of a neuron indicates that
the minimum HD exists between the two vectors I and S.
t S n
For setting up the weights and bias, the above equation is written as: HD (I, S) = I +
2 2
If the weights are fixed to one half of the standard vector S/2 and bias to n/2, then the
network will be able to find the input vector I, closest to the standard vector S. This is done
by finding the output unit with the largest net input.
ww
The aim of an associative memory is, to produce the associated output pattern whenever
one of the input patterns is applied to the neural network. The input pattern may be applied
w.E
to the network either as input or as initial state and the output pattern is observed at the
outputs of some neurons constituting the network.
Associative memories belong to class of neural network that learn according to a certain
a syE
recording algorithm. They require information a priori and their connectivity matrices most
often need to be formed in advance. Writing into memory produces changes in the neural
interconnections. Reading of the stored info from memory named recall, is a transformation
of input signals by the network.
ngi
All memory information is spatially distributed throughout the network. Associative
nee
memory enables a parallel search within a stored data. The purpose of search is to output
one or all stored items that matches the search argument and retrieve it entirely or partially.
The Fig. 2.5.1 shows a block diagram of an associative memory.
rin
g.n
et
Fig. 2.5.1 : Block diagram of an associative memory
In the initialization phase of the associative memory no information is stored; ? because the
information is represented in the w weights they are all set to zero.
The advantage of neural associative memories over other pattern storage algorithms like
lookup tables of hash codes is that the memory access can be fault tolerant with respect to
variation of the input pattern.
In associative memories many associations can be stored at the same time. There are
different schemes of superposition of the memory traces formed by the different
associations. The superposition can be simple linear addition of the synaptic changes
required for each association (like in the Hopfield model) or nonlinear.
ww
which requires time.
There are two classes of associative memory : auto-associative and hetero-associative.
w.E
Whether auto- or hetero-associative, the net can associate not only the exact pattern pairs
used in training, but is also able to obtain associations if the input is similar to one on which
it has been trained.
a
2.5.1 Auto-associative Memory
syE
Auto-associative networks are a special subset of the hetero-associative networks, in which
i
ngi
i
each vector is associated with itself, i.e. y = x for i=1, ..., m. The function of such networks
is to correct noisy input vectors.
Fig. 2.5.2 shows auto-associative memory.
nee
Auto-associative memories are content based memories which can recall a stored sequence
rin
when they are presented with a fragment or a noisy version of it. They are very effective in
de-noising the input or removing interference from the input which makes them a
promising first step in solving the cocktail party problem.
g.n
The simplest version of auto-associative memory is linear associator which is a two-layer
feed-forward fully connected neural network where the output is constructed in a single
feed-forward computation.
et
~
If || X – X || < then ~
i 2 i
x – y . This should be achieved by the learning algorithm, but
becomes very hard when the number m of vectors to be learned is too high.
Fig. 2.5.3 shows block diagram of hetero-associative network.
ww
w.E
a syE
ngi
nee
Fig. 2.5.4 : Hetero-associative network without feedback
rin
g.n
et
w.E
connected to every other unit in the network other than itself.
The output of each neuron is a binary number in {–1,1}. The output vector is the state
a
vector. Starting from an initial state (given as the input vector), the state of the network
syE
changes from one to another like an automaton. If the state converges, the point to which it
converges is called the attractor.
ngi
In its simplest form, the output function is the sign function, which yields 1 for arguments
0 and –1 otherwise.
nee
The connection weight matrix W of this type of network is square and symmetric. The units
in the Hopfield model act as both input and output units.
rin
A Hopfield network consists of “n” totally coupled units. Each unit is connected to all other
units except itself. The network is symmetric because the weight wij for the connection
g.n
between unit i and unit j is equal to the weight wij of the connection from unit j to unit i.
The absence of a connection from each unit to itself avoids a permanent feedback of its
own state value.
Hopfield networks are typically used for classification problems with binary pattern
vectors.
et
Hopfield model is classified into two categories :
1. Discrete Hopfield Model
2. Continuous Hopfield Model
In both discrete and continuous Hopfield network weights trained in a one-shot fashion and
not trained incrementally as was done in case of Perceptron and MLP.
In the discrete Hopfield model, the units use a slightly modified bipolar output function
where the states of the units, i.e., the output of the units remain the same if the current state
is equal to some threshold value.
The continuous Hopfield model is just a generalization of the discrete case. Here, the units
use a continuous output function such as the sigmoid or hyperbolic tangent function. In the
continuous Hopfield model, each unit has an associated capacitor Ci and resistance ri that
model the capacitance andby
Downloaded resistance
M RAMESH ofKUMAR
real neuron's cell membrane, respectively.
(mrameshmail4pec@gmail.com)
w.E
a syE
ngi
Fig. 2.5.7 : BAM network
BAM can be classified into two categories: nee
rin
1. Discrete BAM : The network propagates an input pattern X to the Y layer where the
units in the Y layer will compute their net input.
g.n
2. Continuous BAM : The units use the sigmoid or hyperbolic tangent output function.
The units in the X layer have an extra external input Ii, while the units in the Y layer
have an extra external input Jj for i = 1, 2, ..., m and j = 1, 2, ..., n.
These extra external inputs lead to a modification in the computation of the net input to the
et
units.
ww
while incorporating new information.
Adaptive resonance architectures are artificial neural networks that are capable of stable
w.E
categorization of an arbitrary sequence of unlabeled input patterns in real time. These
architectures are capable of continuous training with non-stationary inputs.
Some models of Adaptive Resonance Theory are :
a
1. ART1 - Discrete input.
2. ART2 - Continuous input. syE
ngi
3. ARTMAP - Using two input vectors, transforms the unsupervised ART model into a
supervised one.
Various others : Fuzzy ART, Fuzzy ARTMAP (FARTMAP), etc…
nee
The primary intuition behind the ART model is that object identification and recognition
rin
generally occur as a result of the interaction of 'top-down' observer expectations with
'bottom-up' sensory information.
g.n
The basic ART system is an unsupervised learning model. It typically consists of a
comparison field and a recognition field composed of neurons, a vigilance parameter, and a
reset module. However, ART networks are able to grow additional neurons if a new input
cannot be categorized appropriately with the existing neurons.
et
ART networks tackle the stability-plasticity dilemma :
1. Plasticity : They can always adapt to unknown inputs if the given input cannot be
classified by existing clusters.
2. Stability : Existing clusters are not deleted by the introduction of new inputs.
3. Problem : Clusters are of fixed size, depending on .
Fig. 2.6.1 shows ART-1 Network.
ART-1 networks, which receive binary input vectors. Bottom-up weights are used to
determine output-layer candidates that may best match the current input.
Top-down weights represent the “prototype” for the cluster defined by each output neuron.
A close match between input and prototype is necessary for categorizing the input.
w.E
The basic ART model, ART1, is comprised of the following components :
1. The short term memory layer : F1 - Short term memory.
a
2. The recognition layer : F2 - Contains the long term memory of the system.
syE
3. Vigilance Parameter : - A parameter that controls the generality of the memory. Larger
means more detailed memories, smaller produces more general memories.
Types of ART : ngi
ART 1
Type
nee
Remarks
It is the simplest variety of ART networks, accepting only binary
inputs.
rin
ART 2
ART 3
Extends network capabilities to support continuous inputs.
g.n
ART 3 builds on ART-2 by simulating rudimentary neurotransmitter
regulation of synaptic activity by incorporating simulated sodium (Na+)
and calcium (Ca2+) ion concentrations into the system’s equations,
which results in a more physiologically realistic means of partially
et
inhibiting categories that trigger mismatch resets.
Fuzzy ART Fuzzy ART implements fuzzy logic into ART’s pattern recognition,
thus enhancing generalizability
ARTMAP It is also known as Predictive ART, combines two slightly modified
ART-1 or ART-2 units into a supervised learning structure where the
first unit takes the input data and the second unit takes the correct
output data, then used to make the minimum possible adjustment of the
vigilance parameter in the first unit in order to make the correct
classification.
Fuzzy ARTMAP Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting
in a corresponding increase in efficiency.
ww
points in space, mapped so that each of the examples of the separate classes are divided by
a gap that is as wide as possible.
w.E
New examples are then mapped into the same space and classified to belong to the class
based on which side of the gap they fall on.
Two class problems
a syE
Many decision boundaries can separate these two classes. Which one should we choose ?
ngi
Perceptron learning rule can be used to find any decision boundary between class 1 and
class 2.
nee
The line that maximizes the minimum margin is a good bet. The model class of “hyper-
planes with a margin of m” has a low VC dimension if m is big.
rin
This maximum-margin separator is determined by a subset of the data points. Data points in
this subset are called “support vectors”. It will be useful computationally if only a small
g.n
fraction of the data points are support vectors, because we use the support vectors to decide
which side of the separator a test case is on.
et
Fig. 2.7.1
ww
w.E
a syE
ngi
Fig. 2.7.3 : Empirical risk
nee
The empirical risk is the average loss of an estimator for a finite set of data drawn from P.
rin
The idea of risk minimization is not only measure the performance of an estimator by its
risk, but to actually search for the estimator that minimizes risk over distribution P. Because
g.n
we don't know distribution P we instead minimize empirical risk over a training dataset
drawn from P. This general learning technique is called empirical risk minimization.
Fig. 2.7.3 shows empirical risk.
Good decision boundary : margin should be large
et
The decision boundary should be as far away from the data of both classes as possible. If
data points lie very close to the boundary, the classifier may be consistent but is more
“likely” to make errors on new instances from the distribution. Hence, we prefer classifiers
that maximize the minimal distance of data points to the separator.
1. Margin (m) : The gap between data points & the classifier boundary. The Margin is the
minimum distance of any sample to the decision boundary. If this hyperplane is in the
canonical form, the margin can be measured by the length of the weight vector. The
margin is given by the projection of the distance between these two points on the
direction perpendicular to the hyperplane. Margin of the separator is the distance
between support vectors.
2
Margin (m) = || w ||
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
2. Maximal margin classifier : A classifier in the family F that maximizes the margin.
Maximizing the margin is good according to intuition and PAC theory. Implies that only
support vectors matter; other training examples are ignorable.
ww
w.E
a syE
Fig. 2.7.4 : Good decision boundary
ngi
Example 2.7.1 : For the following Fig. 2.7.5 find a linear hyperplane (decision boundary) that
will separate the data.
nee
rin
g.n
et
Fig. 2.7.5
Solution :
1. Define what an optimal hyperplane is : maximize margin
2. Extend the above definition for non-linearly separable problems : have a penalty term
for misclassifications
3. Map data to high dimensional space where it is easier to classify with linear decision
surfaces : reformulate problem so that data is mapped implicitly to this space
ww
w.E
a syE
ngi
nee
rin
g.n
Fig. 2.7.6
et
2.7.1 Key Properties of Support Vector Machines
1. Use a single hyperplane which subdivides the space into two half-spaces, one which is
occupied by Class 1 and the other by Class 2
2. They maximize the margin of the decision boundary using quadratic optimization
techniques which find the optimal hyperplane.
3. Ability to handle large feature spaces.
4. Overfitting can be controlled by soft margin approach
5. When used in practice, SVM approaches frequently map the examples to a higher
dimensional space and find margin maximal hyperplanes in the mapped space, obtaining
decision boundaries which are not hyperplanes in the original space.
6. The most popular versions of SVMs use non-linear kernel functions and map the attribute
space into a higher dimensional space to facilitate finding “good” linear decision
boundaries in the modifiedbyspace.
Downloaded M RAMESH KUMAR (mrameshmail4pec@gmail.com)
w.E
3. Another limitation is speed and size.
4. The optimal design for multiclass SVM classifiers is also a research area.
a
2.7.4 Soft Margin SVM syE
For the very high dimensional problems common in text classification, sometimes the data
ngi
are linearly separable. But in the general case they are not, and even if they are, we might
prefer a solution that better separates the bulk of the data while ignoring a few weird noise
documents.
nee
What if the training set is not linearly separable ? Slack variables can be added to allow
rin
misclassification of difficult or noisy examples, resulting margin called soft.
g.n
A soft-margin allows a few variables to cross into the margin or over the hyperplane,
allowing misclassification.
We penalize the crossover by looking at the number and distance of the misclassifications.
This is a trade off between the hyperplane violations and the margin size. The slack
variables are bounded by some set cost. The farther they are from the soft margin, the less
et
influence they have on the prediction.
All observations have an associated slack variable
1. Slack variable = 0 then all points on the margin.
2. Slack variable > 0 then a point in the margin or on the wrong side of the hyperplane
3. C is the tradeoff between the slack variable penalty and the margin
2.7.5 Comparison of SVM and Neural Networks
Support Vector Machine Neural Network
Kernel maps to a very-high dimensional Hidden Layers map to lower dimensional
space spaces
Search space has a unique minimum Search space has multiple local minima
Classification extremely efficient Classification extremely efficient
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
w.E
including neural information processing, plasticity and learning.
At the same time spiking networks offer solutions to a broad range of specific problems in
a
applied engineering, such as fast signal-processing, event detection, classification, speech
syE
recognition, spatial navigation or motor control.
Biological neurons communicate by generating and propagating electrical pulses called
ngi
action potentials or spikes. This feature of real neurons became a central paradigm of a
theory of spiking neural models.
nee
All spiking models share the following common properties with their biological
counterparts:
rin
1) They process information coming from many inputs and produce single spiking output
signals;
g.n
2) Their probability of firing (generating a spike) is increased by excitatory inputs and
decreased by inhibitory inputs;
3) Their dynamics is characterized by at least one state variable; when the internal
variables of the model reach a certain state, the model is supposed to generate one or
more spikes.
et
The most common spiking neuron models are Integrate-and-Fire (IF) and Leaky-
Integrateand-Fire (LIF) units. Both models treat biological neurons as point dynamical
systems.
Spiking network topologies can be classified into three general categories:
1. Feedforward networks : – This is where the data flow from input to output units is
strictly one-directional; the data processing can extend over multiple layers of neurons,
but no feedback connections are present.
2. Recurrent networks :– Here individual neurons or populations of neurons interact
through reciprocal (feedback) connections. Feedback connections result in an internal
state of the network which allows it to exhibit dynamic temporal behavior.
3. Hybrid networks :– This group encompasses networks in which some subpopulations
may be strictly feedforward, while other have recurrent topologies. Interactions between
the subpopulations maybybeM one
Downloaded directional
RAMESH KUMAR or reciprocal.
(mrameshmail4pec@gmail.com)
ww
Ans. : If the input is not an exact match of an association input, then the output is altered
based on the distance from the input.
w.E
Q. 4 What is recall ?
Ans. : If the input vectors are uncorrelated, the Hebb rule will produce the correct weights,
a
and the response of the net when tested with one of the training vectors will be perfect recall.
Q. 5 What do you mean cross talk ?
syE
Ans. : If the input vectors are not orthogonal, the response will include a portion of each of
their target values.
This is commonly called cross talk ngi
Q. 6 Explain spurious stable state. nee
rin
Ans. : If the input vector is an “unknown” vector, the activation vectors produced as the net
iterates will converge to an activation vector that is not one of the stored patterns; such a
pattern is called a spurious stable state.
Q. 7 What is back propagation algorithm ? g.n
Ans. : The back-prop algorithm is an iterative gradient algorithm designed to minimize the
mean square error between the actual output of a multilayer feed-forward perceptron and the
desired output. It requires continuous differentiable non-linearities.
et
Q. 8 What is winner takes all ?
Ans. : In competitive learning, neurons compete among themselves to be activated. While in
Hebbian learning, several output neurons can be activated simultaneously, in competitive
learning, only a single output neuron is active at any time. The output neuron that wins the
“competition” is called the winner-takes-all neuron
Q. 9 Explain Learning Vector Quantization.
Ans. : LVQ is adaptive data classification method. It is based on training data with desired
class information. LVQ uses unsupervised data clustering techniques to preprocesses the data
set and obtain cluster centers.
a
Q. 13 What is McCulloch Pitts model ?
syE
Ans. : McCulloch and Pitts describe a neuron as a logical threshold element with two possible
states. Such a threshold element has “N” input channels and one output channel. An input
ngi
channel is either active (input 1) or silent (input 0). Directed weight graph is used for
connecting neurons.
Q. 14 What are the problems with McCulloch-Pitts neurons ?
Ans. : The problems with McCulloch-Pitts neurons are : nee
1. Weights and thresholds are analytically determined. We cannot learn
rin
2. It is very difficult to minimize size of a network.
Q. 15 Describe the term Perceptron. g.n
Ans. : An arrangement of one input layer of McCulloch-Pitts neurons feeding forward to one
output layer of McCulloch-Pitts neurons is known as a Perceptron. et
2.10 Multiple Choice Questions with Answers
Q. 1 Backpropagation is a supervised learning algorithm, for training ________ perceptrons.
(a) single layer (b) multilayer (c) any form of (d) none
Ans. : (b) multilayer
Q. 2 Learning Vector Quantization network is a ________ neural network where the input
vectors are trained for a specific class or group already mapped in the training set.
(a) supervised (b) unsupervised (c) semisupervised (d) none
Ans. : (a) supervised
Q. 3 Hopfield neural networks is an example of neural networks.
(a) supervised (b) hybrid (c) associative memory (d) none
Ans. : (c) associative memory
Downloaded by M RAMESH KUMAR (mrameshmail4pec@gmail.com)
Q. 4 What is backpropagation ?
(a) It is another name given to the curvy function in the perceptron.
(b) It is the transmission of error back through the network to adjust the inputs.
(c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
(d) None of the above.
Ans. :(c) It is the transmission of error back through the network to allow weights to be
adjusted so that the network can learn.
Q. 5 The BAM (Bidirectional Associative Memory) significantly belongs to _______.
w.E
Ans. : (d) hetero associative
Q. 6 The weights of the BAM are initialized based on the ________ rule.
(a) Hebb
Ans. : (a) Hebb
a
(b) ART
syE
(c) ANN (d) all of these
ww
Ans. : (c) Bidirectional Associative Memory
Q. 13 The Hopfield network consists of a set of neurons forming a multiple loop ________
system.
w.E
(a) unidirectional (b) parallel (c) feedback (d) feedforward
Ans. : (c) feedback
a syE
Q. 14 A ________ Hopfield net can be used to determine whether an input vector is a known
vector or an unknown vector.
(a) binary
Ans. : (a) binary
(b) discrete
ngi
(c) autoassociative (d) All
Q. 19 What was the main point of difference between the Adaline & perceptron model?
(a) Weights are compared with output.
(b) Sensory units result is compared with output.
(c) Analog activation value is compared with output.
(d) All of the mentioned.
Ans. : (c) Analog activation value is compared with output.
Q. 20 Heteroassociative memory can be an example of which type of network?
(a) Group of instars. (b) Group of outstar.
(c) Either group of instars or outstars. (d) Both group of instars or outstars.
ww
Ans. : (c) Either group of instars or outstars
w.E
Q. 21 Hidden units are composed of ________, computing an affine transformation and
element-wise nonlinear function.
(a) output vector (b) input vector
a
(c) activation function
Ans. : (b) input vector syE (d) all of these
g.n
et
Notes
ww
w.E
a syE
ngi
nee
rin
g.n
et