KEMBAR78
Unit-1 Deep Learning | PDF | Neuron | Axon
0% found this document useful (0 votes)
8 views20 pages

Unit-1 Deep Learning

The document explains the structure and function of biological neurons and their computational counterparts in artificial neural networks. It covers the basics of how neurons process information, the McCulloch-Pitts model, and the perceptron learning algorithm, emphasizing their roles in decision-making and problem-solving. Additionally, it discusses the importance of linear separability in classification tasks and the convergence theorem of the perceptron algorithm.

Uploaded by

yandapallijaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views20 pages

Unit-1 Deep Learning

The document explains the structure and function of biological neurons and their computational counterparts in artificial neural networks. It covers the basics of how neurons process information, the McCulloch-Pitts model, and the perceptron learning algorithm, emphasizing their roles in decision-making and problem-solving. Additionally, it discusses the importance of linear separability in classification tasks and the convergence theorem of the perceptron algorithm.

Uploaded by

yandapallijaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Biological Neuron:

The Biological Neuron The human brain consists of a large number, more than a
billion of neural cells that process information. Each cell works like a simple
processor. The massive interaction between all cells and 3 their parallel
processing only makes the brain’s abilities possible. Below Figure represents a
human biological nervous unit. Various parts of biological neural network(BNN)
is marked in Figure

Dendrites are branching fibres that extend from the cell body.

cell body of a neuron contains the nucleus and other structures, support
chemical processing and production of neurotransmitters. Axon is a singular
Fiber carries information away from the cell body of other neurons (dendrites),
muscels, or glands.

Axon hillock is the site of summation for incoming information. At any moment,
the collective influence of all neurons that conduct impulses to a given neuron
will determine whether or not an action potential will be initiated at the axon
hillock and propagated along the axon.
Synapse is the point of connection between two neurons or a neuron and a
muscle or a gland. Electrochemical communication between neurons take place
at these junctions.

Terminal buttons of a neuron are the small knobs at the end of an axon that
release chemicals called neurotransmitters.

computational unit:
In Deep Learning, a computational unit is like a tiny brain cell that does a small
calculation. Many of these units are connected together to form a neural
network, which can learn patterns from data (like recognizing faces,
understanding language, etc.).

A computational unit (also called a neuron or node) in a neural network:

 Receives input (numbers)

 Applies weights to those inputs (how important each input is)

 Adds a bias (a fixed value to shift the result)

 Applies an activation function (to introduce non-linearity)

 Outputs a value (which goes to the next layer of neurons)


Example:

Let's say a unit has 2 inputs:

 Input x1=0.5, weight w1=0.8

 Input x2=1.0, weight w2=0.3

 Bias b=0.1

The computation is:

z=(x1⋅w1)+(x2⋅w2)+b=(0.5*0.8)+(1.0*0.3)+0.1=0.4+0.3+0.1=0.8

z = 0.8

Then we apply an activation function, like ReLU:

f(z)= max(0,z)= max(0,0.8)=0.8

The unit outputs: a= 0.8

Importantce:

These units are the building blocks of neural networks. A single unit is very
simple, but when you connect millions of them, they can learn to:

 Detect objects in images

 Translate languages

 Predict stock prices

 Recognize speech

McCulloch–Pitts Unit – In Depth & Simple


The McCulloch–Pitts Unit is the first model of an artificial neuron

a mathematical function inspired by biological neurons. It laid the foundation


of neural networks and later deep learning.
It receives inputs, adds them up, checks if they are enough (above a threshold),
and if so, outputs 1; otherwise, outputs 0.

Formula:

For inputs x1,x2,...,xn, and a threshold θ theta:

Output y= {1 if ∑xi≥θ

0 otherwise }

 Inputs: Only 0 or 1 (binary)

 Weights: Fixed at 1 (no learning)

 Activation: Step function (0 or 1)

Example:

Student allowed into exam hall when both conditions are true

1.has hall ticket

2.has id card

Step-by-step

Inputs

X1=1 hall ticket present 0 othewise.

X2=1 if id card is present 0 otherwise

Weights

W1=1, w2=1

Total input:

Z=x1w1+x2w2=x1+x2

Input A Input B sum A AND B

0 0 0 0
Input A Input B sum A AND B

0 1 1 0

1 0 1 0

1 1 2 1

McCulloch–Pitts Neuron Setup

 Inputs:x1,x2

 Weights: w1=1,w2=1 (default in M–P model)

 Threshold: θ=2

 Activation Function: Step function

y= { 1 if (x1+x2) ≥2

0 otherwise}
Solving Each Row:

1 Inputs: 0, 0
x1+x2= 0+0= 0<2⇒ y=0

2 Inputs: 0, 1
x1+x2= 0+1 =1<2⇒ y=0

3 Inputs: 1, 0
x1+x2=1+0=1<2⇒y=0

4 Inputs: 1, 1
x1+x2=1+1=2≥2⇒y=1

Output y= {1 if ∑xi≥θ (neuron fires)

0 otherwise( neuron not fires) }

We successfully implemented the AND logic gate using a McCulloch–Pitts


neuron with:
 Equal weights = 1

 Threshold = 2

 Binary inputs

 Step activation function

Can simulate logic gates:

Gate Inputs Threshold Output Logic

AND x1,x2 2 1 if both 1

OR x1,x2 1 1 if any 1

NOT Special unit Inverted logic

Thresholding Logic – In Depth

Meaning:

Thresholding logic is decision-making using a limit (threshold):

"Activate (output = 1) only if enough input signals are strong enough."

This introduces binary decision-making:

 If signal is strong enough → output is ON (1)

 If not → output is OFF (0)

Importantce

It is a non-linear decision function

a simple step function, which allows combining basic logic rules (like AND/OR)
to build complex computations.
Visual Diagram of M–P Neuron:

 McCulloch–Pitts unit is a very simple artificial neuron using binary inputs


and threshold logic.

 It fires (outputs 1) only when total input meets/exceeds a threshold.

 It cannot learn, but it inspired the development of the perceptron, then


multi-layer perceptrons, and eventually deep neural networks.

 Thresholding logic in M–P is replaced in deep learning by activation


functions and learnable weights.

Linear perceptron
A Linear Perceptron is one of the most basic types of artificial neural
networks. It was introduced by Frank Rosenblatt in 1958.

A Perceptron is a computational unit (or a "neuron") that:

1. Takes inputs (features of data),

2. Multiplies each input with a weight,

3. Adds a bias (optional)

4. Applies an activation function (in this case, a simple threshold function),


5. Produces an output (0 or 1).

Linear Perceptron Formula:

output = {1 if w1x1 + w2x2 +……………….+ wnxn + b>=0

0 otherwise}

Where:

X1,x2,…………xn = inputs (features),

W1,w2,w3 = weights,

B = bias.

Goal of Linear Perceptron

To learn the right weights and bias so it can correctly classify inputs into
one of two classes (e.g., YES or NO, 1 or 0).

Learning in Perceptron

It adjusts weights based on the error between actual output and expected
output.
This is called the Perceptron Learning Algorithm.

When It Works Well

It works only when data is linearly separable — i.e., when a straight line
(in 2D) or hyperplane (in higher dimensions) can separate the two classes

Limitation

It cannot solve non-linear problems like XOR (exclusive OR).

Simple Example:

Suppose you want to predict if someone will pass (1) or fail (0) an exam
based on:

Study hours,

Sleep hours.

A linear perceptron might learn:

output = {1 if (2 *study hours) + (1 *sleep hours) - 5 >=0

0 otherwise}

Step 1: Inputs

Assume two students:

Student Study Hours Sleep Hours Expected Output

A 4 3 1 (Pass)

B 1 1 0 (Fail)
So:

X1= study hours

X2= sleep hours

Step 2: Initialize Weights and Bias

Let’s assume:

Weight for study: w1=2

Weight for sleep: w2=1

Bias: b=-5

Step 3: Apply Perceptron Formula

The perceptron computes a value using this formula:

Sum= w1 *x1 + w2*x2 + b

Then, it applies the activation rule:

If Sum>=0=> Output = 1

If Sum<0 =>Output = 0

Example: Student A

Study = 4, Sleep = 3

Since 6 ≥ 0 → Output = 1 (Pass)

✔ Correct prediction!

❌ Example: Student B
Study = 1, Sleep = 1

Since -2 < 0 → Output = 0 (Fail)

✔ Also correct prediction!

The perceptron is drawing a decision boundary (a line) between students who


pass and those who fail, based on how much they study and sleep.

If their total (after multiplying by weights and adding bias) is high enough, we
say they pass.

If not, we say they fail.

Higher weight for study hours (2) means studying is more important for passing.

Sleep has less impact (weight = 1).

Perceptron Learning Algorithm

The Perceptron Learning Algorithm is a method used to train a single-layer


perceptron it helps the model learn the correct weights and bias to make correct
predictions.

To adjust weights and bias so that the perceptron gives the correct output (0 or
1) for each input.

Steps of the Perceptron Learning Algorithm:

Let’s go step-by-step:

🟢 Step 1: Initialize

Start with random or zero values for all weights and bias.

🟢 Step 2: For Each Training Example

Do the following:

a) Take Inputs:

Let inputs be x1,x2,……………..,xn


b) Calculate Output:

ŷ= {1 if(w1x1 + w2x2 +…………………+ XnWn + b) >=0

0 otherwise}

c) Compare With Actual Output

If prediction ŷ is correct, do nothing

If wrong, update weights and bias:

Wi = Wi + η (y -ŷ) xi (for each weight)

b = b +η(y -ŷ) (for bias)

Where:

η is the learning rate (small value like 0.1)

Step 3: Repeat

Do this for all training examples, and repeat for multiple rounds (epochs) until
the perceptron makes no more mistakes or reaches a maximum number of steps.

Example:

Let’s say:

Inputs: x1=1,x2=0

Weights: w1=0.5,w2=-0.5,

Bias b=0

Actual output y=1 , Predicted output ŷ=0

Learning rate η=1


Then update:

W1= 0.5+1(1-0)(1)= 1.5

W2= -0.5+1(1-0)(0)= -0.5

b= 0+1(1-0)= 1

Summary:

Step Description

1⃣ Initialize weights and bias

2⃣ For each example: calculate output

3⃣ If wrong, update weights and bias

4⃣ Repeat until accurate

Importantce:

Works only for linearly separable data.

Doesn’t work for complex patterns (like XOR).

It’s the foundation of modern neural networks!

Linear separability
Linear separability means you can draw a straight line (in 2D), a plane (in 3D), or
a hyperplane (in higher dimensions) that separates the data points of different
classes perfectly.

In classification problems, we often want to separate classes like "cat" vs "dog",


or "yes" vs "no".

If the data is linearly separable, a simple model like a perceptron or linear


classifier can solve it.
If it's not linearly separable, we need more complex models (like neural
networks with hidden layers) to solve it.

If such a boundary can separate all data points of different classes without
error, the data is called linearly separable.

In early models like the Perceptron, linear separability was crucial because:

 A single-layer perceptron can only solve linearly separable problems.

 If the data is not linearly separable, the perceptron fails to converge.

Deep learning solves this by adding hidden layers and non-linear activation
functions like ReLU or sigmoid, which allow the model to learn non-linear
decision boundaries.

✅ If you can draw a single straight line between red and blue dots, they are
linearly separable.

If no straight line can separate them, they are not linearly separable.

AND gate: linearly separable (you can separate output 1 and 0 with a line)

XOR gate: not linearly separable (you need a non-linear model to solve)

For example:

 If apples are on the left and oranges on the right, you can place a ruler
(line) in between them to separate.

 But if they’re mixed in a checkerboard pattern (like XOR), no straight line


can separate them — you’d need a curved or more complex separator.
✅ Linearly Separable Example – AND Gate

A B Output

0 0 0

0 1 0

1 0 0

1 1 1

 You can draw a straight line that separates the output = 1 from output =
0.

 A single-layer perceptron can solve this.

❌ Not Linearly Separable – XOR Gate

A B Output

0 0 0

0 1 1

1 0 1

1 1 0

 No straight line can separate output 1s from output 0s.

 You need at least a 2-layer network with non-linearity to solve it.

Shallow Models like Logistic Regression or Perceptron:

o Only work when data is linearly separable (or close to it).

 Deep Neural Networks:

o Use multiple layers and non-linear activations to transform data


into a linearly separable form in some high-dimensional space.
o For example, in CNNs and RNNs, the deeper layers help to learn
complex patterns even when the input data is not linearly
separable.

Convergence theorem perceptron learning algorithm


The Perceptron Convergence Theorem is a fundamental result in the theory of
neural networks. This theorem provides a guarantee about the performance of
the perceptron algorithm under certain conditions.

Statement: The Perceptron Convergence Theorem states that if there exists a


linear separation (a hyperplane) that can perfectly classify a given set of training
examples, then the perceptron algorithm will converge to a solution that
correctly classifies all the training examples.

Importance feature of Perceptron Convergence Algorithm is that it assures us


that if the data can be clearly divided into two groups (like distinguishing cats
from dogs based on their features), the perceptron will eventually learn the best
way to separate them accurately.

Workings of Perceptron Convergence Algorithm:


Step-by -step working of this algorithm can be shown as:

1. Initialization: Start by setting the weights and bias to small random values.
These weights determine the importance of each input feature.

2. Input: Feed an input data point (a vector of features) into the perceptron.

3.Weighted Sum: Calculate the weighted sum of the input features plus the bias.
This is done by multiplying each input feature by its corresponding weight and
adding them all up, along with the bias. weighted sum = (w1∗x1)+(w2∗x2)+…+(w
n ∗x n)+b(w1∗x1)+(w2∗x2)+…+(w n∗x n)+b

4. Activation Function: Apply the activation function to the weighted sum to get
the output. In a simple perceptron, this is usually a step function that outputs 1
if the weighted sum is positive, and -1 (or 0) if it's negative.

5. Prediction: Compare the perceptron's output (prediction) to the actual label


of the input data point (either 1 or -1).
6. Update Weights: If the prediction is incorrect, adjust the weights and bias.
The weights are updated to reduce the error, moving the decision boundary
closer to correctly classifying the input. The update rule is:

 W i=w i+Δ w i

 b=b+Δb

where

 Δ w i =η∗(y−y′)∗ xi

 Δ b=η∗(y−y′)

 η is learning rate parameter

 y is the actual label

 y' is the predicted label

7. Repeat: Repeat steps 2-6 for all data points in the training set. This process is
called an epoch. Multiple epochs are performed until the perceptron correctly
classifies all training data points or a maximum number of epochs is reached.

8. Convergence: If the data is linearly separable, the algorithm will eventually


find the correct weights and bias to perfectly classify the training data. This
means the perceptron has "converged."

Example:

Dataset (linearly separable)

Input x1 Input x2 Target y

0 0 -1

0 1 -1

1 0 -1

1 1 +1

We use y∈{−1,+1} format for perceptron logic.


Initialize the Perceptron:

 Initial weights: w=[0,0]

 Bias: b=0

 Learning rate: η=1

Training Steps (Epoch 1):

1. Sample 1: x=[0,0],y=−1

 Prediction:

W ⋅x + b=0⋅0+0⋅0+0=0⇒sign(0)=0≠−1

→ Incorrect, so update:

w=w+ η y x =[0,0]+1⋅(−1)⋅[0,0]

b + η y =0+(−1)=−1

2. Sample 2: x=[0,1],y=−1

 Prediction:

W ⋅ x +b=0+0−1=−1⇒sign(−1)=−1⇒Correct

3. Sample 3: x=[1,0],y=−1

 Prediction:

0+0−1=−1⇒Correct0 + 0 - 1 = -1⇒Correct

4. Sample 4: x=[1,1],y=+1

 Prediction:

0+0−1=−1≠+1⇒Incorrect

Update:

w=[0,0]+1⋅(+1)⋅[1,1]=[1,1]

b=−1+1=0

Epoch 2:
Check all samples again with w=[1,1],b=0

 x=[0,0]⇒0⇒sign(0)=0≠−1 → Wrong

 Update:

w=[1,1]+(−1)⋅[0,0]=[1,1]

b=0−1=−1

 x=[0,1]⇒1⋅1+0−1=0⇒Wrong

 Update:

w=[1,1]+(−1)⋅[0,1]=[1,0]

b=−1−1=−2

 Continue this way...

Eventually...

After a few more updates, the perceptron will find a correct weight and bias:

w=[1,1], b=−1.5

Which gives correct predictions:

Input Prediction

[0, 0] 0+0−1.5=−1.5⇒−10 + 0 - 1.5 = -1.5 ⇒−1 ✅

[0, 1] 0+1−1.5=−0.5⇒−10 + 1 - 1.5 = -0.5 ⇒−1 ✅

[1, 0] 1+0−1.5=−0.5⇒−11 + 0 - 1.5 = -0.5 ⇒−1 ✅

[1, 1] 1+1−1.5=0.5⇒+11 + 1 - 1.5 = 0.5 ⇒+1 ✅

Final Output:

The perceptron converges in a finite number of steps


Confirms the Perceptron Convergence Theorem holds (since AND is linearly
separable)

You might also like