0% found this document useful (0 votes)

16 views84 pages

Module 1 DL

Uploaded by

T SANKARA RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views84 pages

Module 1 DL

Uploaded by

T SANKARA RAO

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 84

CSEN3082 DEEP LEARNING

O. V. Ramana Murthy
Course Outcome 1
Understand the role of neural networks and
its various applications.

2
Contents
Artificial Neuron
Feed Forward Networks
Gradient descent
Back propagation
Regularization techniques
Norm penalties as constrained optimization

3
Reference
Chapter 1, 2
Charu C. Aggarwal, “Neural Networks and
Deep Learning”, Springer International
Publishing AG
Deep Learning- Charu Aggarwal
https://www.youtube.com/playlist?list=PLLo1
RD8Vbbb_6gCyqxG_qzCLOj9EKubw7

Ian Goodfellow, Yoshua Bengio, Aaron

Courville, Deep Learning, MIT Press, 2016

4
Neuron

5
Perceptron

The simplest neural network is referred to as the perceptron. This neural

network contains a single input layer and an output node.
6
Artificial Neuron

x1 1
w1
b fact
^
𝑦
w2
x2 
y
xn wn
7
Numerical Example 1
Calculate the output assuming Binary step
activation function 1
0.3
0.45 fact
0.2 ^
𝑦

y=0.93
0.7
0.6

8
^
{
𝑦 = 𝑓 𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛 ( 𝑦 )= 1 𝑖𝑓𝑦 ≥ 0
0 𝑖𝑓𝑦 <0
Artificial Neuron model
Sigmoidal function

=1 is Binary sigmoidal function

w1
x1 y

w2
9 x2
ACTIVATION FUNCTIONS
(A)Identity/linear

(B) Bipolar Binary step

(C) Bipolar sigmoidal step

10
Source [2]
Numerical Example 2
Calculate the output assuming binary sigmoidal
activation function 1
0.3
0.45 fact
0.2 ^
𝑦 =0.72

y=0.93
0.7
0.6
=1 is Binary sigmoidal
11 function
Pre- and Post-Activation
Values

12
Gradient Descent
Gradient represents how STEEP a slope is.

13
Gradient Descent
Given a differentiable function f(x),
gradient descent finds the minimum
by updating the variable x in steps
proportional to the negative of the gradient
(derivative) of f’(x) at the current point.

14
Gradient Descent
 Gradient represents how STEEP a slope is.
 Uphill is positive; downhill is negative.
 One direction – Derivative. multi-direction – Gradient.
 Drawback: Settles to local minimum

15
Gradient Search
For unconstrained optimization i.e., no
constraints

Gradient of function exists

1. Initialization. Choose an initial xk and let k =
0
2. Calculate the derivative, which indicates the
slope of at
3. Update . Where is learning rate
4. Repeat steps 2 -3 till convergence i.e. , or the
maximum number of iterations is reached.
16
Gradient function

17
Case 1: Initial value x0=−2
Iterati x_n+1=x_n−αf′
x_n f'(x_n) f(x_n)
on (n) (x_n)
0−2 f′(−2)=−16 −2+0.1×16=−0.4- f(−2)=12
f′ −0.4−0.1×1.28=− f(−0.4)=3.
1−0.4
(−0.4)=1.28 0.528 23
f′
−0.5 −0.528−0.1×1.39= f(−0.528)
2 (−0.528)=1.
28 −0.667 =2.9
39
f′
−0.6 −0.667−0.1×1.48= f(−0.667)
3 (−0.667)=1.
67 −0.815 toward the local
=2.65
f′ at 𝑥=−1
The trajectory
48 is converging
minimum
−0.8 −0.815−0.1×1.6= f(−0.815)
4 (−0.815)=1.
15 −0.975 =2.5
18 6
Case 2: Initial value x0=2
Iterati x_n+1=x_n−αf′
x_n f'(x_n) f(x_n)
on (n) (x_n)
0 2f′(2)=16 2−0.1×16=0.42 f(2)=12
f
0.4−0.1×−1.28=0. f(0.4)=3.2
1 0.4 ′(0.4)=−1.2
528 3
8
f
0.528−0.1×−1.39= f(0.528)=2
2 0.528 ′(0.528)=−1
0.667 .9
.39
f
0.667−0.1×−1.48= f(0.667)=2
𝑥
The trajectory is
3 0.667 ′(0.667)=−1 converging toward the local
minimum at =1 0.815 .65
.48
f
0.815−0.1×−1.6=0 f(0.815)=2
19 4 0.815 ′(0.815)=−1
.975 .5
Effect of learning rate

20
Chain Rule
The chain rule is applied to calculate the
gradients of the loss function with respect to the
weights and biases (different layers) of the
network. These gradients are then used to
update the parameters during optimization
(e.g., using gradient descent).
The Chain Rule: Formula
If you have a composite function , the derivative
with respect to x is given by:

This principle is used extensively in updating the

weights of different layers using the Loss
21 function at the outside layer
Linear Separability

22
Linearly Separable – AND
gate
x1 x2 Y
(inpu (inpu (outpu
t) t) t)
0 0 0
0 1 0
1 0 0
1 1 1

23
Linearly Separable – AND
gate
Two input sources => Two input neurons
One output => One output Neuron.
 Activation function is binary sigmoidal
 Derivative

24
Linearly Separable – AND
gate
1

x1 w1

𝑦 Y
 f(.
w2 )
x2
25
Back-propagation training/algorithm
Given: Input vector i th instant , Target .
Initialize weights w0, w1, w2 and learning rate
with some random values in the range [0 1]
1. Output
2. Activation function sigmoidal activation
function
3. Compute error:
4. Backpropagate the error to crossing
activation function
where is the derivative of activation function
selected.
26
for sigmoidal activation function
Back-propagation training/algorithm
5. Compute change in weights and bias
,,
6. Update the changes in weights and bias

7. Keep repeating the steps 1 – 6, for all input

combinations ( 4 nos). This is one epoch.
8. Run multiple Epochs till the error decreases and
stabilizes.

27
(4 Rules)Backpropagating
Error
1. Output
𝑦 Y Neuron
 f(.
𝑒 𝑎𝑡 𝑠𝑡𝑎𝑔𝑒)𝑦

2. Across Link
wi 𝑦
xi  f(.
) 𝑦
𝑒 𝑎𝑡 𝑠𝑡𝑎𝑔𝑒
𝑒 𝑎𝑡 𝑠𝑡𝑎𝑔𝑒 𝑥 𝑖 3. Weights Update
28
(4 Rules)Backpropagating
Error
𝑒 𝑎𝑡 𝑠𝑡𝑎𝑔𝑒 𝑥 𝑖
4. Across Link (>1
xi hidden layer)
w1
𝑦1
 f(.
w2 𝑒
𝑦2 )
𝑎𝑡 𝑠𝑡𝑎𝑔𝑒 𝑦 1

 f(.
wn 𝑒
𝑦𝑛 )
𝑎𝑡 𝑠𝑡𝑎𝑔𝑒 𝑦 2

29
 f(.
𝑒 𝑎𝑡 𝑠𝑡𝑎𝑔𝑒 𝑦 𝑛
The power of nonlinear activation
functions in transforming a data set to
linear separability

30
Linearly not Separable – XOR
gate
x1 x2 Y
(inpu (inpu (outpu
t) t) t)
0 0 0
0 1 1
1 0 1
1 1 0

31
Linearly not Separable – XOR
gate
Two input sources => Two input neurons
One output => One output Neuron.
One hidden layer => 2 neurons
 Activation function is binary sigmoidal
 Derivative

32
Linearly not Separable – XOR
gate

33
Input Hidden Output
layer layer layer

1 1
v01
w0
v11 Z1
x1
v12 w1
v02
w2
Y
v21
x2
v22 Z2

34
Back-propagation Training
Given: Inputs , target .
Initialize weights and learning rate with
Feed-forward Phase

some random values

1. Hidden unit , j = 1 to p hidden neurons
2. output , sigmoidal activation function
3. Output unit
4. Output sigmoidal activation function

35
Back-propagation Training
Back-propagation of error Phase

5. Compute error correction term

where is derivative
6. Compute change in weights and bias
,
send to previous layer
7. Hidden unit
8. Calculate error term
9. Compute change in weights and bias
,

36
Weights and Bias update phase

Back-propagation Training
10. Each output unit, k = 1 to m update weights and bias

11. Each hidden unit, j = 1 to p update weights and bias

12. Check for stopping criterion e.g. certain number of

epochs or when targets are equal/close to network
outputs

37
Hidden neuron input computation

𝑧 1 =v 11 𝑥 1+ v 2 1 𝑥 2 +𝑣 01
1 1
v01
w0
v11 Z1
x1
v12 w1
v02
w2
Y
v21
x2
v22 Z2
𝑧 2=v 12 𝑥 1+ v 22 𝑥2 + 𝑣 02
38
Hidden neuron output computation
𝑍 1= 𝑓 𝑠𝑖𝑔 ( 𝑧 1 )
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
w2
Y
v21
x2 𝑧2
v22 Z2
𝑍 2= 𝑓 𝑠𝑖𝑔 ( 𝑧 2 )
39
Output neuron input computation
𝑦 =w 1 𝑍 1+w 2 𝑍 2+𝑤 0
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
w2
Y
v21
x2 𝑧2
v22 Z2

40
Output neuron Output computation
𝑌 = 𝑓 𝑠𝑖𝑔 ( 𝑦 )
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
( 𝑦 )Y
w2
v21
x2 𝑧2
v22 Z2

41
Output Error correction computation
′
𝛿=(𝑡 −𝑌 ) 𝑓 ( 𝑦 )
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
( 𝑦 )Y
w2
v21
x2 𝑧2
v22 Z2

42
Output neuron changes updates
computation
∆ 𝑤 0=1. 𝛿
1 1 ∆ 𝑤 1=𝛿. 𝑍 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
( 𝑦 )Y
v21
w2
𝛿
x2 𝑧2
v22 Z2 ∆ 𝑤 2=𝛿. 𝑍 2

43
Hidden neuron error propagation
computation
𝛿1=𝛿. 𝑤 1
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 w1
v02
( 𝑦 )Y
v21
w2
𝛿
x2 𝑧2
v22 Z2
𝛿2=𝛿. 𝑤 2
44
Hidden neuron error correction
computation ′
𝛿11=𝛿1 . 𝑓 ( 𝑧 1 )
1 1
v01
w0
v11 Z1
x1 𝑧1
v12 𝛿1 w1
v02
( 𝑦 )Y
v21
w2
𝛿
x2 𝑧2
v22
𝛿2′ Z2
𝛿22=𝛿2 . 𝑓 ( 𝑧 2 )
45
Hidden neuron changes updates
∆computation
𝑣 01=𝛿11 ∆ 𝑣 11=𝛿11 𝑥 1 2

1 1
v01
w0
v11
x1 𝛿11𝑧 1Z1
v12 𝛿1 w1
v02
( 𝑦 )Y
v21
w2
𝛿
x2
𝛿22
𝑧 2
v22
𝛿2Z2
∆ 𝑣 02=𝛿22 1 2
46
NN with Two Hidden Layers
(HW)

(1)
𝑤𝑖 , 𝑗

47
Regularization (to avoid
Overfitting)
One of the primary causes of corruption of
the generalization process is overfitting.
The objective is to determine a curve that
defines the border of
the two groups using the training data.

48
Overfitting
One of the primary causes of corruption of
the generalization process is overfitting.
The objective is to determine a curve that
defines the border of
the two groups using the training data.

49
Overfitting
Some outliers penetrate the area of the other group
and disturb the boundary. As Machine Learning
considers all the data, even the noise, it ends up
producing an improper model (a curve in this case).
This would be penny-wise and pound-foolish.

50
Remedy : Regularization
Regularization is a numerical method that
attempts to construct a model structure as
simple as possible. The simplified model
can avoid the effects of overfitting at the
small cost of performance.
Cost function Sum of squared errors

51
Remedy : Regularization
For this reason, overfitting of the neural
network can be improved by adding the sum
of weights to the cost function, (new) Cost
function
In order to drop the value of the cost function,
both the error and weight should be controlled
to be as small as possible.
However, if a weight becomes small enough,
the associated nodes will be practically
disconnected. As a result, unnecessary
connections are eliminated, and the neural
network becomes simpler.
52
Add L1 Regularization to XOR
Network
New Loss function
The gradient of the regularized loss w.r.t a
weight w is:

Update rule for weights w is

53
Add L2 Regularization to XOR
Network
New Loss function
The gradient of the regularized loss w.r.t a
weight w is:

Update rule for weights w is

54
XOR implementation with L1
# Apply L2 regularization to weights
hidden_layer_weights += learning_rate *
(np.dot(hidden_layer_output.T, output_layer_delta)
– sign(hidden_layer_weights))

input_layer_weights += learning_rate *
(np.dot(inputs.T, hidden_layer_delta) -
sign(input_layer_weights))

# Update biases (no regularization applied to

biases) hidden_layer_bias +=
np.sum(output_layer_delta, axis=0, keepdims=True)
* learning_rate
input_layer_bias += np.sum(hidden_layer_delta,
axis=0, keepdims=True) * learning_rate
55
XOR implementation with L2
# Apply L2 regularization to weights
hidden_layer_weights += learning_rate *
(np.dot(hidden_layer_output.T, output_layer_delta)
- hidden_layer_weights)

input_layer_weights += learning_rate *
(np.dot(inputs.T, hidden_layer_delta) -
input_layer_weights)

# Update biases (no regularization applied to

biases) hidden_layer_bias +=
np.sum(output_layer_delta, axis=0, keepdims=True)
* learning_rate
input_layer_bias += np.sum(hidden_layer_delta,
axis=0, keepdims=True) * learning_rate
56
Norm Penalties as Constrained
Optimization
Denote Regularized Objective function

is a loss function e.g., MSE, Cross-entropy

is a penalty function.
is hyerparameter. Setting α to 0 results in no
regularization. Larger values of α correspond
to more regularization.
L2 Norm Parameter Regularization (ridge
regression or Tikhonov regularization)
If
L1 Norm Regularization
57
If
Norm Penalties as Constrained
Optimization
Consider the Objective function with L2 Norm
Regularization

Transform the L2 regularization term into a

constraint:

Subject to
Where τ>0: A constraint hyperparameter
controlling the regularization strength.
• Larger τ: Weaker regularization (equivalent to
smaller ).
• Smaller τ: Stronger regularization (equivalent
58
to larger ).
Norm Penalties as Constrained
Optimization
The constrained optimization can be
reformulated using the Lagrangian multiplier
λ

Here, λ acts as a penalty parameter for

violating the constraint
This formulation helps in using techniques
like dual optimization or projected
gradient descent to enforce constraints
during optimization.

59
Norm Penalties as Constrained
Optimization – Backpropagation
Consider
Containing the MSE and the constraint term
1. Compute Gradients for MSE Term
The gradients of the MSE term with respect to the
weights and biases are:

Propagate through the layers as in standard

backpropagation.
2. Compute Gradients for L2 Constraint
The L2 regularization gradient is:

60
This term is added to the gradient from the MSE
during weight updates.
Norm Penalties as Constrained
Optimization – Backpropagation
Enforcing the Constraint ()
To ensure the constraint is satisfied, after
each weight update:
1.Check the L2 Norm

2. Project Back if Necessary: If , rescale W

as follows

61
Numerical Example
Consider two inputs, one hidden layer with
two neurons and one output. NO bias for any
neuron.
1. Let initial weights ,
2. Using back propagation, say we obtained ,
,
3. For learning rate , update

62
Numerical Example
4. Updated, new weights ,
5. Compute Current L2 Norm

6. If τ=1.5, the constraint is already

satisfied, and no adjustment is required.

63
Numerical Example
7. If τ=1.5, the constraint is already satisfied,
and no adjustment is required.
8. If τ=1.0, the constraint is violated. Then do
the following adjustment

64
65
Appendix: Example
Implementation
Using Back-
propagation
network, find the
new weights for
the network
shown aside.
Input = [0 1]
and target
output is 1. use
learning rate
0.25 and binary
sigmoidal
activation
66function
1. Consolidate the
information
Given: Inputs [0 1], target 1.
[

[]=[0.4 0.1 0.2]

Learning rate
Activation function is binary sigmoidal
Derivative

67
2. Feed-forward Phase
1. Hidden unit , j = 1,2

2. Output , sigmoidal activation function

3. Output unit
4. Output sigmoidal activation function

68
2. Feed-forward Phase
1. Hidden unit , j = 1,2

2. Output , sigmoidal activation function ,

3. Output unit
4. Output sigmoidal activation function

69
3. Back-propagation of error
Phase
5. Compute error correction term

6. Compute change in weights and bias

,
,
,

70
3. Back-propagation of error
Phase
5. Compute error correction term

6. Compute change in weights and bias

,
,
,
7. Hidden unit

71
3. Back-propagation of error
Phase
5. Compute error correction term

6. Compute change in weights and bias

,
,
,
7. Hidden unit

72
3. Back-propagation of error
Phase
8. Calculate error term

73
3. Back-propagation of error
Phase
8. Calculate error term

74
3. Back-propagation of error
Phase
8. Calculate error term
9. Compute change in weights and bias
,

75
3. Back-propagation of error
Phase
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118

76
3. Back-propagation of error
Phase
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118

0.0
0.00245

77
3. Back-propagation of error
Phase
8. Calculate error term
9. Compute change in weights and bias
,
0.0118
0.0118

0.0
0.00245

78
4. Weights and Bias update
phase
10. Each output unit, k = 1 to m update weights and bias

79
4. Weights and Bias update
phase
10. Each output unit, k = 1 to m update weights and bias

80
4. Weights and Bias update
phase
11. Each hidden unit, j = 1 to p update weights and bias

81
4. Weights and Bias update
phase
11. Each hidden unit, j = 1 to p update weights and bias

82
4. Weights and Bias update
phase
11. Each hidden unit, j = 1 to p update weights and bias

83
Epo v 11 v21 v01 v12 v22 v02
ch
0 0.6 -0.1 0.3 -0.3 0.4 0.5
1 0.6 -0.097 0.303 -0.3 0.401 0.501
Epoc z1 z2 w1 w2 w0 y
h
0 0.549 0.711 0.4 0.1 -0.2 0.523
1 0.551 0.711 0.416 0.121 -0.17 0.536
3 3 3
Write a program for this case and cross-
verify your answers. After how many epochs
will the output converge?
84

Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
35 pages
ML807 Distributed and Federated Learning Slides 2
No ratings yet
ML807 Distributed and Federated Learning Slides 2
211 pages
Module 3.docxaiml
No ratings yet
Module 3.docxaiml
20 pages
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
No ratings yet
Shortcomings in Single Layer Neural Networks: Most Real World Problems Are Not
43 pages
Back Propagation
No ratings yet
Back Propagation
29 pages
Exp 3
No ratings yet
Exp 3
9 pages
5 - From Linear Models To Multi-Layer Perceptrons
No ratings yet
5 - From Linear Models To Multi-Layer Perceptrons
45 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Lesson 3 Artificial Neural Network
No ratings yet
Lesson 3 Artificial Neural Network
77 pages
Neural Net 3rdclass
No ratings yet
Neural Net 3rdclass
35 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Ece18898g Neural Networks
No ratings yet
Ece18898g Neural Networks
47 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
68 pages
Pr2 ANN WriteUp
No ratings yet
Pr2 ANN WriteUp
11 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
26 pages
Neural Network Backpropagation Guide
No ratings yet
Neural Network Backpropagation Guide
9 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
47 pages
Multi-Layer Perceptron & Backpropagation
No ratings yet
Multi-Layer Perceptron & Backpropagation
88 pages
Slides 11
No ratings yet
Slides 11
48 pages
Neural Networks (Basics)
No ratings yet
Neural Networks (Basics)
30 pages
Ann MJJ-1
No ratings yet
Ann MJJ-1
64 pages
Module 3 - Modified
No ratings yet
Module 3 - Modified
106 pages
Lecture 10
No ratings yet
Lecture 10
155 pages
ML Unit-2
No ratings yet
ML Unit-2
141 pages
Backpropagation
No ratings yet
Backpropagation
37 pages
SJNanda Neural Network
No ratings yet
SJNanda Neural Network
43 pages
SJNanda - Neural Network
No ratings yet
SJNanda - Neural Network
43 pages
Week 2 Artificial Neural Networks - Part II
No ratings yet
Week 2 Artificial Neural Networks - Part II
40 pages
Artificial Neural Networks Basics
No ratings yet
Artificial Neural Networks Basics
50 pages
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
No ratings yet
Pattern Classification 10. Linear Perceptron, Least Squares & Multi-Layer Nns
38 pages
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
No ratings yet
Advanced Information Retreival: Chapter 02: Modeling - Neural Network Model
31 pages
Lecture 13.3 Classification ANN
No ratings yet
Lecture 13.3 Classification ANN
64 pages
DL U-I Introduction Part-2
No ratings yet
DL U-I Introduction Part-2
48 pages
cst414 - Deep Learning
No ratings yet
cst414 - Deep Learning
34 pages
Neural Networks & Gradient Descent
No ratings yet
Neural Networks & Gradient Descent
77 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
ANN Notes Updated
0% (1)
ANN Notes Updated
46 pages
Artificial Neural Networks: HCMC University of Technology Sep. 2008
No ratings yet
Artificial Neural Networks: HCMC University of Technology Sep. 2008
71 pages
Module 5 Lecture 2
No ratings yet
Module 5 Lecture 2
45 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
An Introduction To Mathematics Behind Neural Networks
No ratings yet
An Introduction To Mathematics Behind Neural Networks
5 pages
Notes Chapter8
No ratings yet
Notes Chapter8
4 pages
Backpropagation in MLP: A Detailed Guide
No ratings yet
Backpropagation in MLP: A Detailed Guide
34 pages
Deep Learning
No ratings yet
Deep Learning
19 pages
Artificial Neural Network: Lecture Module 22
No ratings yet
Artificial Neural Network: Lecture Module 22
54 pages
6 Working Example 01-08-2024
No ratings yet
6 Working Example 01-08-2024
21 pages
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
No ratings yet
EELU ANN ITF309 Lecture 08 Spring 2023-2024-Sensitivity-Back-Propagation
39 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
Backpropagation
No ratings yet
Backpropagation
12 pages
Lecture 10 Neural Network
No ratings yet
Lecture 10 Neural Network
34 pages
Neural Network - Optimization DRAFT 3.11
No ratings yet
Neural Network - Optimization DRAFT 3.11
66 pages
Neural Network
No ratings yet
Neural Network
97 pages
Module4 AI
No ratings yet
Module4 AI
12 pages
Artificial Neural Networks
No ratings yet
Artificial Neural Networks
71 pages
Lecture NN Part1
No ratings yet
Lecture NN Part1
62 pages
CNN and Gan: Introduction To
No ratings yet
CNN and Gan: Introduction To
58 pages
20251007144240820
No ratings yet
20251007144240820
31 pages
Weka
No ratings yet
Weka
99 pages
Paris Olympics
No ratings yet
Paris Olympics
1 page
ASM_M1
No ratings yet
ASM_M1
3 pages
Logic Gates and Their Role in Neural Networks - 1
No ratings yet
Logic Gates and Their Role in Neural Networks - 1
9 pages
Task 2
No ratings yet
Task 2
20 pages
Perceptron Presentation
No ratings yet
Perceptron Presentation
12 pages
OC AP NEET Cutoffs Year, Caste, Round 1 To 3 & College Wise Local Seats
No ratings yet
OC AP NEET Cutoffs Year, Caste, Round 1 To 3 & College Wise Local Seats
2 pages
Web Analytics
No ratings yet
Web Analytics
47 pages
Project 2
No ratings yet
Project 2
4 pages
Task 10 Presentation
No ratings yet
Task 10 Presentation
15 pages
Object-Oriented Software Engineering Syllabus
No ratings yet
Object-Oriented Software Engineering Syllabus
10 pages
Oil & Gas Construction Services
No ratings yet
Oil & Gas Construction Services
22 pages
CLASS 8th Soc SC BRIDGE COURSE Bridge Course Primary 2024 25
No ratings yet
CLASS 8th Soc SC BRIDGE COURSE Bridge Course Primary 2024 25
42 pages
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
No ratings yet
Product Manual 36693 (Revision D, 5/2015) : PG Base Assemblies
10 pages
Limitation of Science
No ratings yet
Limitation of Science
3 pages
Compal Electronics Engineering Document
75% (4)
Compal Electronics Engineering Document
1 page
Molylube Cam Compound L56
No ratings yet
Molylube Cam Compound L56
2 pages
Manual Spreading
No ratings yet
Manual Spreading
4 pages
Krebs Cycle Study Resources
No ratings yet
Krebs Cycle Study Resources
1 page
Biodiversity and The Healthy Society
No ratings yet
Biodiversity and The Healthy Society
26 pages
Theory 2 - Code of Ethics For Professional Teacher & Historical Development of Teaching
No ratings yet
Theory 2 - Code of Ethics For Professional Teacher & Historical Development of Teaching
5 pages
Linear Inequalities
100% (1)
Linear Inequalities
7 pages
MEIOSIS POWERPOINT Grade 12 Bio Corrected 2024-1
No ratings yet
MEIOSIS POWERPOINT Grade 12 Bio Corrected 2024-1
47 pages
6630-Article Text-12424-1-10-20180412
No ratings yet
6630-Article Text-12424-1-10-20180412
13 pages
PM - I CIA
No ratings yet
PM - I CIA
5 pages
USX Corporation
0% (4)
USX Corporation
13 pages
Database Lab: EER Diagrams
No ratings yet
Database Lab: EER Diagrams
9 pages
Applied Economics
No ratings yet
Applied Economics
11 pages
3b - Varieties and Registers of Spoken and Written Language
No ratings yet
3b - Varieties and Registers of Spoken and Written Language
34 pages
KV 27TS27
No ratings yet
KV 27TS27
10 pages
Emerging Issues of Procurement
No ratings yet
Emerging Issues of Procurement
19 pages
Excel Tutorial PDF
No ratings yet
Excel Tutorial PDF
13 pages
Class 12 Geography: Planning & Sustainable Development
No ratings yet
Class 12 Geography: Planning & Sustainable Development
40 pages
Petroleum Basin Classifications
No ratings yet
Petroleum Basin Classifications
21 pages
The Group Effect Social Cohesion and Health Outcomes Readable PDF Download
100% (13)
The Group Effect Social Cohesion and Health Outcomes Readable PDF Download
14 pages
Study Guide QA1
No ratings yet
Study Guide QA1
3 pages
Ati:F:Ht1: Service Bulletin
No ratings yet
Ati:F:Ht1: Service Bulletin
42 pages
SHP 2 Grid
No ratings yet
SHP 2 Grid
7 pages
WLP Q1 G11-Philosophy
No ratings yet
WLP Q1 G11-Philosophy
8 pages
Phrasal Structure and Verb Complementation: Introduction To Phrase Structure Grammar
No ratings yet
Phrasal Structure and Verb Complementation: Introduction To Phrase Structure Grammar
10 pages
Diagramas GDZ-50E
No ratings yet
Diagramas GDZ-50E
4 pages

Module 1 DL

Uploaded by

Module 1 DL

Uploaded by

CSEN3082 DEEP LEARNING

Ian Goodfellow, Yoshua Bengio, Aaron

The simplest neural network is referred to as the perceptron. This neural

=1 is Binary sigmoidal function

(B) Bipolar Binary step

(C) Bipolar sigmoidal step

Gradient of function exists

This principle is used extensively in updating the

7. Keep repeating the steps 1 – 6, for all input

some random values

5. Compute error correction term

11. Each hidden unit, j = 1 to p update weights and bias

12. Check for stopping criterion e.g. certain number of

Update rule for weights w is

Update rule for weights w is

# Update biases (no regularization applied to

# Update biases (no regularization applied to

is a loss function e.g., MSE, Cross-entropy

Transform the L2 regularization term into a

Here, λ acts as a penalty parameter for

Propagate through the layers as in standard

2. Project Back if Necessary: If , rescale W

6. If τ=1.5, the constraint is already

[]=[0.4 0.1 0.2]

2. Output , sigmoidal activation function

2. Output , sigmoidal activation function ,

6. Compute change in weights and bias

6. Compute change in weights and bias

6. Compute change in weights and bias

You might also like