Unit -II
Artificial Neural Networks
Introduction
Why Artificial Neural Networks?
There are two basic reasons why we are interested in
building artificial neural networks (ANNs):
• Technical viewpoint: Some problems such as
character recognition or the prediction of future
states of a system require massively parallel and
adaptive processing.
• Biological viewpoint: ANNs can be used to
replicate and simulate components of the human
(or animal) brain, thereby giving us insight into
natural information processing.
• Artificial Neural Networks (ANN) are algorithms based on brain function and are
used to model complicated patterns and forecast issues.
• The development of ANN was the result of an attempt to replicate the workings of
the human brain. The workings of ANN are extremely similar to those of
biological neural networks, although they are not identical.
• An artificial neural network consists of a pool of simple processing units which
communicate by sending signals to each other over a large number of weighted
connections.
• Artificial Neural Networks are a robust method for approximating real-valued,
discrete-valued, and vector-valued target functions.
• They are most effective when used against real-world sensor data.
• They stem from a biological metaphor.
What is Artificial Neural Network?
• An Artificial Neural Network (ANN) is a computational model inspired by the
human brain’s neural structure. It consists of interconnected nodes (neurons)
organized into layers. Information flows through these nodes, and the network
adjusts the connection strengths (weights) during training to learn from data,
enabling it to recognize patterns, make predictions, and solve various tasks in
machine learning and artificial intelligence.
Artificial Neural Networks
• The “building blocks” of neural networks are the
neurons.
• In technical systems, we also refer to them as units or nodes.
• Basically, each neuron
receives input from many other neurons.
changes its internal state (activation) based on the current
input.
sends one output signal to many other neurons, possibly
including its input neurons (recurrent network).
How do ANNs work?
An artificial neural network (ANN) is either a hardware
implementation or a computer program which strives to
simulate the information processing capabilities of its biological
exemplar. ANNs are typically composed of a great number of
interconnected artificial neurons. The artificial neurons are
simplified models of their biological counterparts.
ANN is a technique for solving problems by constructing software
that works like our brains.
How do our brains work?
The Brain is A massively parallel information processing system.
• Has about 1011 neurons each connected to 104 other neurons.
• Each neuron has inputs which are called dendrites and one output
which is called the axon.
• The switching time is about 0.001 second. It takes around 0.1
second for the brain to recognize an image, which implies about
100 inference steps.
• So, the brain must do some parallel processing on highly
distributed data.
How do our brains work?
• A neuron is connected to other
neurons through about 10,000
synapses
• A neuron receives input from other
neurons. Inputs are combined.
• Once input exceeds a critical level,
the neuron discharges a spike ‐ an
electrical pulse that travels from the
body, down the axon, to the next
neuron(s)
• The axon endings almost touch the
dendrites or cell body of the next
neuron.
• Transmission of an electrical signal from one neuron to the next is
effected by neurotransmitters
• Neurotransmitters are chemicals which are released from the first
neuron and which bind to the Second.
• This link is called a synapse. The strength of the signal that reaches the
next neuron depends on factors such as the amount of neurotransmitter
available
Dendrites: Input
Cell body: Processor
Synaptic: Link
Axon: Output
How do ANNs work?
An artificial neuron is an imitation of a human neuron
How do ANNs work?
• Now, let us have a look at the model of an artificial neuron.
How do ANNs work?
............
Input xm x2 x1
Processing ∑
∑= X1+X2 + ….+Xm =y
Output y
How do ANNs work?
Not all inputs are equal
............
xm x2 x1
Input
wm .....
weights w2 w1
Processing ∑ ∑= X1w1+X2w2 + ….+Xmwm
=y
Output y
How do ANNs work?
The signal is not passed down to the
next neuron verbatim
............
xm x2 x1
Input
wm ..... w2
weights w1
Processing ∑
Transfer Function
f(vk)
(Activation Function)
Output y
The output is a function of the input, that is
affected by the weights, and the transfer
functions
APPROPRIATE PROBLEMS FOR NEURAL NETWORK LEARNING
• Instances are represented by many attribute-value pairs. These input attributes may be highly
correlated or independent of one another. Input values can be any real values.
• The target function output may be discrete-valued, real-valued, or a vector of several real- or
discrete-valued attributes.
• The training examples may contain errors. ANN learning methods are quite robust to noise in the
training data.
• Long training times are acceptable. Network training algorithms typically require longer training
times than, say, decision tree learning algorithms.
• Fast evaluation of the learned target function may be required. Although ANN learning times are
relatively long, evaluating the learned network, in order to apply it to a subsequent instance, is
typically very fast.
• The ability of humans to understand the learned target function is not important. The weights
learned by neural networks are often difficult for humans to interpret. Learned neural networks are
less easily communicated to humans than learned rules.
PERCEPTRON
• A Perceptron is an Artificial Neuron. It is the simplest possible Neural Network.
• It was introduced by Frank Rosenblatt in 1957s.
• The brain cells (Neurons) receive input from our senses by electrical signals. The Neurons, then
again, use electrical signals to store information, and to make decisions based on previous input.
Frank had the idea that Perceptron could simulate brain principles, with the ability to learn and
make decisions.
• It is the simplest type of feedforward neural network, consisting of a single layer of input nodes
that are fully connected to a layer of output nodes. The original Perceptron was designed to take a
number of binary inputs, and produce one binary output (0 or 1).
• The idea was to use different weights to represent the importance of each input, and that the sum of
the values should be greater than a threshold value before making a decision like yes or no (true or
false) (0 or 1).
Definition
• A perceptron takes a vector of real-valued inputs, calculates a linear combination
of these, and outputs a 1 if its greater than some threshold or -1 otherwise.
Perceptron decision function
• A decision function φ(z) of Perceptron is defined to
take a linear combination of x and w vectors.
• The value z in t he decision f unct ion is given by:
• The decision function is +1 if z is greater than a
threshold θ, and it is -1 otherwise.
•
Bias Unit
• For simplicity, the threshold θ can be brought to
t he lef t and represent ed as w0x0, where w0= -θ
and x0= 1.
• The value w0 is called the bias unit.
• The decision function then becomes:
Output
• The figure shows how the decision function
squashes wTx to either +1 or -1 and how it can
be used to discriminate between two linearly
separable classes.
Types of Perceptron
• Single-Layer Perceptron: This type of perceptron is limited to learning linearly separable patterns.
effective for tasks where the data can be divided into distinct categories through a straight line.
• Multilayer Perceptron: Multilayer perceptrons possess enhanced processing capabilities as they
consist of two or more layers, adept at handling more complex patterns and relationships within the
data.
Basic Components of Perceptron
• Input Features: The perceptron takes multiple input features, each input feature represents a characteristic or
attribute of the input data.
• Weights: Each input feature is associated with a weight, determining the significance of each input feature in
influencing the perceptron’s output. During training, these weights are adjusted to learn the optimal values.
• Summation Function: The perceptron calculates the weighted sum of its inputs using the summation function.
The summation function combines the inputs with their respective weights to produce a weighted sum.
• Activation Function: The weighted sum is then passed through an activation function. Perceptron uses
Heaviside step function functions. which take the summed values as input and compare with the threshold and
provide the output as 0 or 1.
• Output: The final output of the perceptron, is determined by the activation function’s result. For example, in
binary classification problems, the output might represent a predicted class (0 or 1).
• Bias: A bias term is often included in the perceptron model. The bias allows the model to make adjustments
that are independent of the input. It is an additional parameter that is learned during training.
• Learning Algorithm (Weight Update Rule): During training, the perceptron learns by adjusting its weights and
bias based on a learning algorithm. A common approach is the perceptron learning algorithm, which updates
weights based on the difference between the predicted output and the true output.
Activation function
• The activation f unction applies a step rule (convert
the numerical output into +1 or -1) to check if the
output of the weighting function is greater than
zero or not.
Example
• For example:
If ∑wixi> 0 => then final output “o” = 1 (issue bank loan)
Else, final output “o” = -1 (deny bank loan)
• Step function gets triggered above a certain value of the
neuron output; else it outputs zero. Sign Function outputs
+1 or -1 depending on whether neuron output is greater
than zero or not. Sigmoid is the S-curve and outputs a
value between 0 and 1.
How does Perceptron work?
1. All the inputs x are multiplied with their weights w.
2. Add all the multiplied values and call them Weighted Sum.
3. Apply that weighted sum to the correct Activation Function.
Perceptron training rule:
• 1. Evaluate the network according to the equation: σ𝑛𝑖=0 𝑊𝑖. 𝑋𝑖 + 𝑏.
• 2. If the result of step 1 is greater than zero, output O=1 ; if it is less than zero, O=0 .
• 3. If the current output ‘O’ is already equal to the desired output ’t’ , repeat step 1 with a
different set of inputs. If the current output is different from the desired output , proceed
to step 4.
• 4. Adjust the current weights according to:
• Here t is the target output for the current training example, o is the output generated by
the perceptron, and ᶯ is a positive constant called the learning rate. The role of the
learning rate is to moderate the degree to which weights are changed at each step. It is
usually set to some small value (e.g., 0.1) and is sometimes made to decay as the number
of weight-tuning iterations increases.
• 5. Repeat the algorithm from step 1 until O=t for every vector pair.
Imagine a perceptron (in your brain). The perceptron tries to decide if you should go
to a concert. Is the artist good? Is the weather good? What weights should these facts
have?
• Criteria Input Weight
• Artists is Good x1 = 0 or 1 w1 = 0.7
• Weather is Good x2 = 0 or 1 w2 = 0.6
• Friend will Come x3 = 0 or 1 w3 = 0.5
• Food is Served x4 = 0 or 1 w4 = 0.3
• Drinks is Served x5 = 0 or 1 w5 = 0.4
• The Perceptron Algorithm:
• 1. Set a threshold value: Threshold = 1.5
• 2. Multiply all inputs with its weights:
• x1 * w1 = 1 * 0.7 = 0.7
• x2 * w2 = 0 * 0.6 = 0
• x3 * w3 = 1 * 0.5 = 0.5
• x4 * w4 = 0 * 0.3 = 0
• x5 * w5 = 1 * 0.4 = 0.4
• 3. Sum all the results:
• 0.7 + 0 + 0.5 + 0 + 0.4 = 1.6 (The Weighted Sum)
• 4. Activate the Output:
• Return true if the sum > 1.5 ("Yes I will go to the Concert")
Examples
1. AND
• If the two inputs are TRUE (+1), the output of Perceptron is positive, which amounts to
TRUE.
• x1= 1 (TRUE), x2= 1 (TRUE)
• w0 = -.8, w1 = 0.5, w2 = 0.5
• => o(x1, x2) => -.8 + 0.5*1 + 0.5*1 = 0.2 > 0
2. OR
• If either of the two inputs are TRUE (+1), the output of Perceptron is positive, which
amounts to TRUE.
• x1 = 1 (TRUE), x2 = 0 (FALSE)
• w0 = -.3, w1 = 0.5, w2 = 0.5
• => o(x1, x2) => -.3 + 0.5*1 + 0.5*0 = 0.2 > 0
Limitations of perceptron training rule
• It can be proven that this procedure will converge in finite
time if the training examples are linearly separable and the
learning rate is sufficiently small.
• If the data are not linearly separable then convergence is not
assured.
Perceptron at a Glance
• Perceptron has the following characteristics:
– Perceptron is an algorithm for Supervised Learning of
single layer binary linear classifier.
– Optimal weight coefficients are automatically learned.
– Weights are multiplied with the input features and decision
is made if the neuron is fired or not.
– Activation function applies a step rule to check if the
output of the weighting function is greater than zero.
– Linear decision boundary is drawn enabling the distinction
between the two linearly separable classes +1 and -1.
– If the sum of the input signals exceeds a certain threshold,
it outputs a signal; otherwise, there is no output.
Gradient descent and delta rule
Limitation of gradient descent
• If learning rate too large it is likely to miss the local minimum or even not reach
converge at all.
• If too small, it will take much longer to converge.
• If the number of inputs is large, this becomes even more problematic. Finally,
gradient descent might never find the global minimum.
Stochastic gradient descent
• One common variation on gradient descent intended to alleviate these difficulties is called
incremental gradient descent, or alternatively stochastic gradient descent.
• Whereas the gradient descent training rule presented in Equation (4.7) computes weight updates
after summing over all the training examples in D,.
• The idea behind stochastic gradient descent is to approximate this gradient descent search by
updating weights incrementally, following the calculation of the error for each individual example.
Gradient descent Stochastic gradient descent
The error is summed over all examples weights are updated
before updating weights upon examining each training example
More computation per weight update Comparatively low
Larger step size per weight update Comparatively low
May stuck in local minima avoid falling into local minima
Limitations of Perceptron:
The perceptron model has some limitations that can make it unsuitable for certain
types of problems:
• Limited to linearly separable problems.
• Convergence issues with non-separable data
• Requires labelled data
• Sensitivity to input scaling