Biological Neuron:
The Biological Neuron The human brain consists of a large number, more than a
billion of neural cells that process information. Each cell works like a simple
processor. The massive interaction between all cells and 3 their parallel
processing only makes the brain’s abilities possible. Below Figure represents a
human biological nervous unit. Various parts of biological neural network(BNN)
is marked in Figure
Dendrites are branching fibres that extend from the cell body.
cell body of a neuron contains the nucleus and other structures, support
chemical processing and production of neurotransmitters. Axon is a singular
Fiber carries information away from the cell body of other neurons (dendrites),
muscels, or glands.
Axon hillock is the site of summation for incoming information. At any moment,
the collective influence of all neurons that conduct impulses to a given neuron
will determine whether or not an action potential will be initiated at the axon
hillock and propagated along the axon.
Synapse is the point of connection between two neurons or a neuron and a
muscle or a gland. Electrochemical communication between neurons take place
at these junctions.
Terminal buttons of a neuron are the small knobs at the end of an axon that
release chemicals called neurotransmitters.
computational unit:
In Deep Learning, a computational unit is like a tiny brain cell that does a small
calculation. Many of these units are connected together to form a neural
network, which can learn patterns from data (like recognizing faces,
understanding language, etc.).
A computational unit (also called a neuron or node) in a neural network:
Receives input (numbers)
Applies weights to those inputs (how important each input is)
Adds a bias (a fixed value to shift the result)
Applies an activation function (to introduce non-linearity)
Outputs a value (which goes to the next layer of neurons)
Example:
Let's say a unit has 2 inputs:
Input x1=0.5, weight w1=0.8
Input x2=1.0, weight w2=0.3
Bias b=0.1
The computation is:
z=(x1⋅w1)+(x2⋅w2)+b=(0.5*0.8)+(1.0*0.3)+0.1=0.4+0.3+0.1=0.8
z = 0.8
Then we apply an activation function, like ReLU:
f(z)= max(0,z)= max(0,0.8)=0.8
The unit outputs: a= 0.8
Importantce:
These units are the building blocks of neural networks. A single unit is very
simple, but when you connect millions of them, they can learn to:
Detect objects in images
Translate languages
Predict stock prices
Recognize speech
McCulloch–Pitts Unit – In Depth & Simple
The McCulloch–Pitts Unit is the first model of an artificial neuron
a mathematical function inspired by biological neurons. It laid the foundation
of neural networks and later deep learning.
It receives inputs, adds them up, checks if they are enough (above a threshold),
and if so, outputs 1; otherwise, outputs 0.
Formula:
For inputs x1,x2,...,xn, and a threshold θ theta:
Output y= {1 if ∑xi≥θ
0 otherwise }
Inputs: Only 0 or 1 (binary)
Weights: Fixed at 1 (no learning)
Activation: Step function (0 or 1)
Example:
Student allowed into exam hall when both conditions are true
1.has hall ticket
2.has id card
Step-by-step
Inputs
X1=1 hall ticket present 0 othewise.
X2=1 if id card is present 0 otherwise
Weights
W1=1, w2=1
Total input:
Z=x1w1+x2w2=x1+x2
Input A Input B sum A AND B
0 0 0 0
Input A Input B sum A AND B
0 1 1 0
1 0 1 0
1 1 2 1
McCulloch–Pitts Neuron Setup
Inputs:x1,x2
Weights: w1=1,w2=1 (default in M–P model)
Threshold: θ=2
Activation Function: Step function
y= { 1 if (x1+x2) ≥2
0 otherwise}
Solving Each Row:
1 Inputs: 0, 0
x1+x2= 0+0= 0<2⇒ y=0
2 Inputs: 0, 1
x1+x2= 0+1 =1<2⇒ y=0
3 Inputs: 1, 0
x1+x2=1+0=1<2⇒y=0
4 Inputs: 1, 1
x1+x2=1+1=2≥2⇒y=1
Output y= {1 if ∑xi≥θ (neuron fires)
0 otherwise( neuron not fires) }
We successfully implemented the AND logic gate using a McCulloch–Pitts
neuron with:
Equal weights = 1
Threshold = 2
Binary inputs
Step activation function
Can simulate logic gates:
Gate Inputs Threshold Output Logic
AND x1,x2 2 1 if both 1
OR x1,x2 1 1 if any 1
NOT Special unit Inverted logic
Thresholding Logic – In Depth
Meaning:
Thresholding logic is decision-making using a limit (threshold):
"Activate (output = 1) only if enough input signals are strong enough."
This introduces binary decision-making:
If signal is strong enough → output is ON (1)
If not → output is OFF (0)
Importantce
It is a non-linear decision function
a simple step function, which allows combining basic logic rules (like AND/OR)
to build complex computations.
Visual Diagram of M–P Neuron:
McCulloch–Pitts unit is a very simple artificial neuron using binary inputs
and threshold logic.
It fires (outputs 1) only when total input meets/exceeds a threshold.
It cannot learn, but it inspired the development of the perceptron, then
multi-layer perceptrons, and eventually deep neural networks.
Thresholding logic in M–P is replaced in deep learning by activation
functions and learnable weights.
Linear perceptron
A Linear Perceptron is one of the most basic types of artificial neural
networks. It was introduced by Frank Rosenblatt in 1958.
A Perceptron is a computational unit (or a "neuron") that:
1. Takes inputs (features of data),
2. Multiplies each input with a weight,
3. Adds a bias (optional)
4. Applies an activation function (in this case, a simple threshold function),
5. Produces an output (0 or 1).
Linear Perceptron Formula:
output = {1 if w1x1 + w2x2 +……………….+ wnxn + b>=0
0 otherwise}
Where:
X1,x2,…………xn = inputs (features),
W1,w2,w3 = weights,
B = bias.
Goal of Linear Perceptron
To learn the right weights and bias so it can correctly classify inputs into
one of two classes (e.g., YES or NO, 1 or 0).
Learning in Perceptron
It adjusts weights based on the error between actual output and expected
output.
This is called the Perceptron Learning Algorithm.
When It Works Well
It works only when data is linearly separable — i.e., when a straight line
(in 2D) or hyperplane (in higher dimensions) can separate the two classes
Limitation
It cannot solve non-linear problems like XOR (exclusive OR).
Simple Example:
Suppose you want to predict if someone will pass (1) or fail (0) an exam
based on:
Study hours,
Sleep hours.
A linear perceptron might learn:
output = {1 if (2 *study hours) + (1 *sleep hours) - 5 >=0
0 otherwise}
Step 1: Inputs
Assume two students:
Student Study Hours Sleep Hours Expected Output
A 4 3 1 (Pass)
B 1 1 0 (Fail)
So:
X1= study hours
X2= sleep hours
Step 2: Initialize Weights and Bias
Let’s assume:
Weight for study: w1=2
Weight for sleep: w2=1
Bias: b=-5
Step 3: Apply Perceptron Formula
The perceptron computes a value using this formula:
Sum= w1 *x1 + w2*x2 + b
Then, it applies the activation rule:
If Sum>=0=> Output = 1
If Sum<0 =>Output = 0
Example: Student A
Study = 4, Sleep = 3
Since 6 ≥ 0 → Output = 1 (Pass)
✔ Correct prediction!
❌ Example: Student B
Study = 1, Sleep = 1
Since -2 < 0 → Output = 0 (Fail)
✔ Also correct prediction!
The perceptron is drawing a decision boundary (a line) between students who
pass and those who fail, based on how much they study and sleep.
If their total (after multiplying by weights and adding bias) is high enough, we
say they pass.
If not, we say they fail.
Higher weight for study hours (2) means studying is more important for passing.
Sleep has less impact (weight = 1).
Perceptron Learning Algorithm
The Perceptron Learning Algorithm is a method used to train a single-layer
perceptron it helps the model learn the correct weights and bias to make correct
predictions.
To adjust weights and bias so that the perceptron gives the correct output (0 or
1) for each input.
Steps of the Perceptron Learning Algorithm:
Let’s go step-by-step:
🟢 Step 1: Initialize
Start with random or zero values for all weights and bias.
🟢 Step 2: For Each Training Example
Do the following:
a) Take Inputs:
Let inputs be x1,x2,……………..,xn
b) Calculate Output:
ŷ= {1 if(w1x1 + w2x2 +…………………+ XnWn + b) >=0
0 otherwise}
c) Compare With Actual Output
If prediction ŷ is correct, do nothing
If wrong, update weights and bias:
Wi = Wi + η (y -ŷ) xi (for each weight)
b = b +η(y -ŷ) (for bias)
Where:
η is the learning rate (small value like 0.1)
Step 3: Repeat
Do this for all training examples, and repeat for multiple rounds (epochs) until
the perceptron makes no more mistakes or reaches a maximum number of steps.
Example:
Let’s say:
Inputs: x1=1,x2=0
Weights: w1=0.5,w2=-0.5,
Bias b=0
Actual output y=1 , Predicted output ŷ=0
Learning rate η=1
Then update:
W1= 0.5+1(1-0)(1)= 1.5
W2= -0.5+1(1-0)(0)= -0.5
b= 0+1(1-0)= 1
Summary:
Step Description
1⃣ Initialize weights and bias
2⃣ For each example: calculate output
3⃣ If wrong, update weights and bias
4⃣ Repeat until accurate
Importantce:
Works only for linearly separable data.
Doesn’t work for complex patterns (like XOR).
It’s the foundation of modern neural networks!
Linear separability
Linear separability means you can draw a straight line (in 2D), a plane (in 3D), or
a hyperplane (in higher dimensions) that separates the data points of different
classes perfectly.
In classification problems, we often want to separate classes like "cat" vs "dog",
or "yes" vs "no".
If the data is linearly separable, a simple model like a perceptron or linear
classifier can solve it.
If it's not linearly separable, we need more complex models (like neural
networks with hidden layers) to solve it.
If such a boundary can separate all data points of different classes without
error, the data is called linearly separable.
In early models like the Perceptron, linear separability was crucial because:
A single-layer perceptron can only solve linearly separable problems.
If the data is not linearly separable, the perceptron fails to converge.
Deep learning solves this by adding hidden layers and non-linear activation
functions like ReLU or sigmoid, which allow the model to learn non-linear
decision boundaries.
✅ If you can draw a single straight line between red and blue dots, they are
linearly separable.
If no straight line can separate them, they are not linearly separable.
AND gate: linearly separable (you can separate output 1 and 0 with a line)
XOR gate: not linearly separable (you need a non-linear model to solve)
For example:
If apples are on the left and oranges on the right, you can place a ruler
(line) in between them to separate.
But if they’re mixed in a checkerboard pattern (like XOR), no straight line
can separate them — you’d need a curved or more complex separator.
✅ Linearly Separable Example – AND Gate
A B Output
0 0 0
0 1 0
1 0 0
1 1 1
You can draw a straight line that separates the output = 1 from output =
0.
A single-layer perceptron can solve this.
❌ Not Linearly Separable – XOR Gate
A B Output
0 0 0
0 1 1
1 0 1
1 1 0
No straight line can separate output 1s from output 0s.
You need at least a 2-layer network with non-linearity to solve it.
Shallow Models like Logistic Regression or Perceptron:
o Only work when data is linearly separable (or close to it).
Deep Neural Networks:
o Use multiple layers and non-linear activations to transform data
into a linearly separable form in some high-dimensional space.
o For example, in CNNs and RNNs, the deeper layers help to learn
complex patterns even when the input data is not linearly
separable.
Convergence theorem perceptron learning algorithm
The Perceptron Convergence Theorem is a fundamental result in the theory of
neural networks. This theorem provides a guarantee about the performance of
the perceptron algorithm under certain conditions.
Statement: The Perceptron Convergence Theorem states that if there exists a
linear separation (a hyperplane) that can perfectly classify a given set of training
examples, then the perceptron algorithm will converge to a solution that
correctly classifies all the training examples.
Importance feature of Perceptron Convergence Algorithm is that it assures us
that if the data can be clearly divided into two groups (like distinguishing cats
from dogs based on their features), the perceptron will eventually learn the best
way to separate them accurately.
Workings of Perceptron Convergence Algorithm:
Step-by -step working of this algorithm can be shown as:
1. Initialization: Start by setting the weights and bias to small random values.
These weights determine the importance of each input feature.
2. Input: Feed an input data point (a vector of features) into the perceptron.
3.Weighted Sum: Calculate the weighted sum of the input features plus the bias.
This is done by multiplying each input feature by its corresponding weight and
adding them all up, along with the bias. weighted sum = (w1∗x1)+(w2∗x2)+…+(w
n ∗x n)+b(w1∗x1)+(w2∗x2)+…+(w n∗x n)+b
4. Activation Function: Apply the activation function to the weighted sum to get
the output. In a simple perceptron, this is usually a step function that outputs 1
if the weighted sum is positive, and -1 (or 0) if it's negative.
5. Prediction: Compare the perceptron's output (prediction) to the actual label
of the input data point (either 1 or -1).
6. Update Weights: If the prediction is incorrect, adjust the weights and bias.
The weights are updated to reduce the error, moving the decision boundary
closer to correctly classifying the input. The update rule is:
W i=w i+Δ w i
b=b+Δb
where
Δ w i =η∗(y−y′)∗ xi
Δ b=η∗(y−y′)
η is learning rate parameter
y is the actual label
y' is the predicted label
7. Repeat: Repeat steps 2-6 for all data points in the training set. This process is
called an epoch. Multiple epochs are performed until the perceptron correctly
classifies all training data points or a maximum number of epochs is reached.
8. Convergence: If the data is linearly separable, the algorithm will eventually
find the correct weights and bias to perfectly classify the training data. This
means the perceptron has "converged."
Example:
Dataset (linearly separable)
Input x1 Input x2 Target y
0 0 -1
0 1 -1
1 0 -1
1 1 +1
We use y∈{−1,+1} format for perceptron logic.
Initialize the Perceptron:
Initial weights: w=[0,0]
Bias: b=0
Learning rate: η=1
Training Steps (Epoch 1):
1. Sample 1: x=[0,0],y=−1
Prediction:
W ⋅x + b=0⋅0+0⋅0+0=0⇒sign(0)=0≠−1
→ Incorrect, so update:
w=w+ η y x =[0,0]+1⋅(−1)⋅[0,0]
b + η y =0+(−1)=−1
2. Sample 2: x=[0,1],y=−1
Prediction:
W ⋅ x +b=0+0−1=−1⇒sign(−1)=−1⇒Correct
3. Sample 3: x=[1,0],y=−1
Prediction:
0+0−1=−1⇒Correct0 + 0 - 1 = -1⇒Correct
4. Sample 4: x=[1,1],y=+1
Prediction:
0+0−1=−1≠+1⇒Incorrect
Update:
w=[0,0]+1⋅(+1)⋅[1,1]=[1,1]
b=−1+1=0
Epoch 2:
Check all samples again with w=[1,1],b=0
x=[0,0]⇒0⇒sign(0)=0≠−1 → Wrong
Update:
w=[1,1]+(−1)⋅[0,0]=[1,1]
b=0−1=−1
x=[0,1]⇒1⋅1+0−1=0⇒Wrong
Update:
w=[1,1]+(−1)⋅[0,1]=[1,0]
b=−1−1=−2
Continue this way...
Eventually...
After a few more updates, the perceptron will find a correct weight and bias:
w=[1,1], b=−1.5
Which gives correct predictions:
Input Prediction
[0, 0] 0+0−1.5=−1.5⇒−10 + 0 - 1.5 = -1.5 ⇒−1 ✅
[0, 1] 0+1−1.5=−0.5⇒−10 + 1 - 1.5 = -0.5 ⇒−1 ✅
[1, 0] 1+0−1.5=−0.5⇒−11 + 0 - 1.5 = -0.5 ⇒−1 ✅
[1, 1] 1+1−1.5=0.5⇒+11 + 1 - 1.5 = 0.5 ⇒+1 ✅
Final Output:
The perceptron converges in a finite number of steps
Confirms the Perceptron Convergence Theorem holds (since AND is linearly
separable)