0% found this document useful (0 votes)

3 views50 pages

L4 - Deep Learning

Uploaded by

Thanh Thuỳ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views50 pages

L4 - Deep Learning

Uploaded by

Thanh Thuỳ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Deep Learning

Lecturers: Prof. Ho Tu Bao - /Prof. Dr. Tran Thi Oanh

Teaching Assistant: Vo Thao, Tran Lan
2
Revision

➢Describe
o Input layer, Hidden layer, Output layer, Weights, Activation Function
➢Fill in the blank

➢Visual recall
“Can you sketch a simple neural network with 2 input neurons, one hidden layer,
and two output neurons?”

3
Different types of NN

5
Types of connectivity
➢Feedforward networks output units
o These compute a series of transformations
o Typically, the first layer is the input and the hidden units
last layer is the output.
➢Recurrent networks
input units
o These have directed cycles in their
connection graph. They can have
complicated dynamics.
o More biologically realistic.

6
Distinguish

Shallow Learning Deep Learning

➢Depth (Layer): 1 (or none) ➢2 or more hidden layers
hidden layer
➢Learning capacity: Limited ➢High (can model complex data)

➢Training time: Fast ➢Slower, requires tuning

➢Use cases: simple ➢Complex tasks (esp. vision, NLP)

7
Several types of modern practical Deep NN
1. Feed forward neural network (FFNN)
2. Convolution neural network (CNN)
3. Recurrent neural network (RNN)
4. Long short term memory (LSTM)
5. Transformer (self-study)
6. GPT (self-study)

8
DeepSeek
2025

BERT
2021
MLP, LSTM
RNN 1997 LLaMA,
1986 Gemini
2023
9
Source: medium.com
Feed forward neural network

10
11
Characteristics
➢Multi-layer feed-forward networks
o One or more hidden layers. Input projects only from previous layers onto a layer.

Input Layer Hidden Layers Output Layer

Good

Bad

12
➢Multi-layer feed-forward networks

Input Hidden Output

layer layers layer

13
Convolutional Neural Network
Cited from Pei-Wei Tsai

15
CNN Overview
➢ Develop by Yan LeCun in 1989
➢ A specialized kind of neural network for
processing data that has a known grid-like
topology
o E.g: Time-series data, image data
➢CNNs are simply neural networks that use
convolution in place of general matrix
multiplication in at least one of their layers.
o Convolution – a type of linear algebra

16
Convolution Operation
➢ 1-D data convolution 1 3 0 2
= 1 × −1 + 3 × 1 = 2
-1 1 -1

1 3 0 2
1 3 0 2 -1 1 -1 = 3 × −1 + 2 × −1 = −5
-1 1 -1
𝐴: 1x4 vector 𝐵: 1x3 vector
𝐴 ∗ 𝐵: 1 3 0 2
= 2×1 = 2
= 0 -1 1 -1

1 3 0 2
1 3 0 2 = 2 × −1 = −2
-1 1 -1
= 1 × −1 = −1
-1 1 -1
∴ 𝐴 ∗ 𝐵: -1 -2 2 -5 2 -2
1 3 0 2
= 1 × 1 + 3 × −1 = −2
-1 1 -1
17
Convolution Operation (cont.)
➢ 2-D matrix convolution 𝐴 ∗ 𝐵: −1 −1 1 0 0 0
−2 0 ⋯

-1 -1 1 0 0 0
1 0 0 0
-2 0 0 -1 -1 1
1 0 0 1 -1 -1 1
0 -1 -4 -1 2 -1
0 1 1 0 -1 1 -1
0 -3 0 0 -3 0
1 0 0 1 1 -1 -1
-1 2 -1 -3 0 -1
𝐴: 4x4 matrix 𝐵: 3x3 matrix
1 -1 -1 1 -1 -1

Stride = 1 𝐴 ∗ 𝐵: 6x6 matrix

18
Why CNN is Developed?
➢Other NNs can also process image data.
However, using a typical NN to deal with
image data would cause the dramatically
increasement of coefficients.
o For example, a 100x100 colour image would
require 100x100x3 to represent the data. If the first
layer contains 1000 neurons, we will need to deal
with 30,000,000 parameters in the first layer. This
number doesn’t include the later layers but it is
already infeasible.

➢CNN uses prior knowledge to reduce the

dimension of parameters before creating the
network.
19
CNN structure overview

20
CNN Structure
Property 1
• Some patterns are much smaller than the whole image. (Localisation)

Convolution Property 2
• The same patterns appear in different regions.

Max Pooling Property 3

• Subsampling the pixels will not change the object.

Fully Connected
Convolution Feedforward Network

Output: Cat/Dog
Max Pooling

Flatten

21
https://image.slidesharecdn.com/deep-learning-lispnyc-june-2017-170625181139/95/deep-learning-7-638.jpg?cb=1498414658
Convolution

CNN - Convolution
-1 -1 1
0 1 0 0 1 0
-1 1 -1
0 1 0 0 0 1 -1 -1 -2 -3
1 -1 -1
0 1 0 0 1 0 -1 -4 -1 3
3x3 Filter 1
0 0 1 1 0 0
-1 1 -1 -3 0 0 -3
0 1 0 0 1 0
-1 1 -1
1 0 0 0 0 1 3 -1 -3 -1
-1 1 -1
6x6 image
3x3 Filter 2
⋮
Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
22
Convolution

CNN - Convolution
-1 -1 1
0 1 0 0 1 0
-1 1 -1
0 1 0 0 0 1 -1 -1 -2 -3
1 -1 -1
0 1 0 0 1 0 -1 -4 -1 3
3x3 Filter 1
0 0 1 1 0 0
-1 1 -1 -3 0 0 -3
0 1 0 0 1 0
-1 1 -1
1 0 0 0 0 1 3 -1 -3 -1
-1 1 -1
6x6 image
3x3 Filter 2
⋮ Property 2
Different location
Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
23
Convolution

CNN - Convolution
Feature Map
-1 -1 1
0 1 0 0 1 0
-1 1 -1
• The convolution process
0 1 0 0 0 1 -13 -1 -2 -31 keeps going until all filters
1 -1 -1 -3 -2
are used.
0 1 0 0 1 0
3x3 Filter 1 -11 -4
-2 -10 3-1 • The collection of the
0 0 1 1 0 0 convolution results is
-1 1 -1 -31 0-2 0-2 -31
0 1 0 0 1 0 called the Feature Map.
-1 1 -1
1 0 0 0 0 1 3-1 -1
-1 -3
-1 -1
-1
-1 1 -1
6x6 image
3x3 Filter 2
⋮ Property 2
Different location
Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
24
Convolution

CNN - Convolution 1:
2:
0
1
3
3: 0
Feature Map
0 1 0 0 1 0 4: 0
1 2 3 4 5 6
5: 1
0 1 0 0 0 1
7 8 9 10 11 12 6: 0
0 1 0 0 1 0 -13 -1 -2 -31
13 14 15 16 17 18 -3 -2 7: 0
-3
0 0 1 1 0 0 8: 1
19 20 21 22 23 24 -11 -4
-2 -10 3-1
9: 0
0 1 0 0 1 0
25 26 27 28 29 30 10: 0
-31 0-2 0-2 -31
1 0 0 0 0 1 11: 0
31 32 33 34 35 36
3-1 12: 1
6x6 image -1
-1 -3
-1 -1
-1
13: 0
14: 1
• Only connect to 9 inputs
-1 1 -1 15: 0
• In fact, the feature map can but not fully connected.
16: 0
-1 1 -1 be obtained by the not fully
17: 1
-1 1 -1 connected NN structure. 18: 0
⋮ ⋮
3x3 Filter 2
25
Max Pooling
CNN – Max Pooling
Feature Map
From Filter 1 From Filter 2
-13 -1 -2 -31 -1 -1 -2 -3 3 -3 -2
-3 -2 1
-11 -4
-2 -10 3-1 -1 -4 -1 3 1 -2 0 -1
-31 0-2 0-2 -31 -3 0 0 -3 1 -2 -2 1
3-1 -1
-1 -3
-1 -1
-1 3 -1 -3 -1 -1 -1 -1 -1

• Keep either the maximum 3 1

value or the mean value to -1 3
achieve down-sampling. 1 1
3 0

26
Max Pooling
CNN – Max Pooling
• Every time when going through the Convolution plus Max Pooling, the dimension of the image is
reduced and multiple feature maps are generated.

0 1 0 0 1 0

0 1 0 0 0 1

0 1 0 0 1 0 -13 31
Convolution Max Pooling
0 0 1 1 0 0 31 01
0 1 0 0 1 0
2x2 image with
1 0 0 0 0 1
deeper channels
6x6 image

• The feature maps is considered as composed of a voxel (3-D cubes).

27
Flatten
CNN – Flatten and Fully Connected Network

➢After straightening the feature map into a 1-D vector, it is pushed

into the fully connected
-1
network for output.
3

3
-13 31
0
31
Results
01
3
2x2 image with 1
deeper channels
1

28
CNNs learn in a
hierarchical way
➢Early layers focus on simple patterns
o Filters in the first layer detect very basic features: vertical edges, horizontal
lines, gradients, dots.
o These are the simplest and most common patterns in images. Useful for all
kinds of objects.
➢Middle layers capture combinations of those patterns
o The next layers combine edges and lines to form simple shapes like: Corners,
Curves, Texture patterns (e.g., stripes, waves), Contours
o give hints about parts of objects (like eyes, noses, wheels, etc.)
➢Deeper layers recognize complex objects or high-level semantics
o At this level, the network has seen many combinations of low- and mid-level
features.
o These layers encode semantic meaning — what the image is about.
29
Video recap

30
CNN - Advantages
➢ Local Feature Detection
o CNNs excel at detecting local patterns such as edges, textures, and shapes through convolutional filters.
➢ Parameter Sharing
o The same set of filters is applied across the entire input. → significantly reduces the number of
parameters.
➢ Insensitive to rotation and shifting and scaling
o CNNs can recognize patterns regardless of their position in the input field. → making them reliable for
real-world applications where objects may appear at different locations.
➢ Hierarchical Feature Learning
o Layers build upon each other, with lower layers capturing simple features and higher layers capturing complex
patterns. → can learn a rich representation of the input data,
➢ Effective for Large-scale Data
o CNNs can handle large and high-dimensional data effectively. → suitable for real-time image and video processing,
medical imaging, and other applications with large datasets.
➢ Reduced Need for Manual Feature Extraction
o CNNs automatically learn to identify features during training. → Reduces the need for domain-specific knowledge
and manual feature engineering, making the development process faster and more accessible.
31
31
CNN - Disadvantages
➢High Computational Cost
o may not be suitable for applications with limited resources or require special hardware like GPUs.
➢Data Hungry
o CNNs generally require large amounts of labeled data to achieve good performance.
➢Sensitivity to Hyperparameters
o Performance can be highly sensitive to the choice of hyperparameters like learning rate, filter size, and
network architecture.
→ fine-tuning, which can be resource-intensive.
➢Lack of Interpretability
o difficult to interpret and understand the learned features.
➢Spatial Information Loss
o Pooling layers can cause loss of spatial information.
➢Vulnerability to Adversarial Attacks
o can be easily fooled by adversarial examples—small, intentionally crafted perturbations that cause
misclassification.
→ Raises concerns about their robustness and security, especially in critical applications. 32
32
Activities - Design Your Own 3×3 Kernel
➢Students will create custom 3×3 kernels to perform common
image-processing tasks:
o Highlight horizontal lines
o Sharpen an image
o Blur an image

33
Recurrent Neural Network (RNN)

37
Traditional ANN
➢ Inputs and outputs are independent of each
others
→ Bad idea for many tasks
If you want to predict the next word in a
sentence you better know which words came
before it

38
Analogy
➢Imagine you're reading a sentence word by word:
“The dog chased the cat through the ...”

➢To guess the next word ("park", "house", etc.), you need
to remember what you've read before. That’s what an RNN
does.

39
RNN overview
➢ Introduced by
(Rumelhart et al.,
1986a) for processing
sequential data
o Time-series, texts,
speechs
➢The idea behind RNNs
is to make use of
sequential information
Source: https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/

40
RNN
➢Recurrent neural networks
o RNNs process input one step at a time, and they
remember the past using a hidden state.
o At each word, it updates its "memory" based on the
new word and what it already remembers.
➢Have a “memory” which captures information
about what has been calculated so far

41
A RNN and the unfolding in time of the computation
involved in its forward computation

Source: https://dennybritz.com/posts/wildml/recurrent-neural-networks-tutorial-part-1/

42
Text Generation

43
Sampled Word man is waking down ?

Softmax

RNN

vectorize
vectorize
vectorize

vectorize

vectorize
Embedding

Input word … man is waking down …

44
RNN – Recurrent Neural Network

45
Advantages and disadvantages of RNN

Advantage Disadvantage
✓ Handle sequential data × Prone to vanishing and
effectively, including text, exploding gradient problems,
speech, and time series. hindering learning.
✓ Process inputs of any length, × Training can be challenging,
unlike feedforward neural especially for long sequences.
networks. × Computationally slower than
✓ Share weights across time other neural network
steps, enhancing training architectures.
efficiency.

46
RNN Limitation
1. Vanishing Gradient Problem
o In RNNs, during backpropagation through many time steps (e.g., long
sequences), the gradients often shrink to near zero.
o This means early inputs in the sequence are “forgotten” — the network can't
learn long-term dependencies.
o “I grew up in France… I speak fluent ___.”
A standard RNN might forget "France" by the time it reaches the blank, and
fail to predict “French”.

47
RNN Limitation (2)
2. Exploding Gradients
o The opposite of vanishing: sometimes gradients blow up, making training
unstable.
o This happens especially when the network is deep in time (long sequences).
3. Short-Term Memory
o Standard RNNs are only good at remembering recent inputs.
o They struggle with long-term context, even if it's crucial.
4. Training Instability
o RNNs are hard to train effectively on long sequences.
o They often get stuck or converge poorly.

48
Long-short term memory
- LSTM -

49
How LSTM Solves These Issues
➢LSTM introduces a cell state (a kind of long-term memory) and gates
to control information flow:
o Forget gate: Decides what to remove from memory.
o Input gate: Chooses what new information to add.
o Output gate: Decides what to pass forward.
➢This gating mechanism protects important information over long time
steps, and prevents gradients from vanishing or exploding.

RNNs forget too easily. LSTMs remember what matters and for how long.

50
Source: Colah ‘s blog https://colah.github.io/posts/2015-08-Understanding-LSTMs/

51
Transformer and attention

52
GPT-3
➢Introduced by Brown et al. in Language Models are Few-Shot Learners

53
Practice using Python
➢Stock Price prediction with RNN/LSTM

➢CNN for image processing on Imdb dataset

54
Thank you!
Q&A

Module-4 DL
No ratings yet
Module-4 DL
22 pages
Module 3
No ratings yet
Module 3
67 pages
DL Unit-4
No ratings yet
DL Unit-4
26 pages
CNN2
No ratings yet
CNN2
70 pages
DL Unit Iv
No ratings yet
DL Unit Iv
18 pages
Module5 ML
No ratings yet
Module5 ML
112 pages
CNNs for AI and Machine Learning
No ratings yet
CNNs for AI and Machine Learning
16 pages
DL Unit-Ii
No ratings yet
DL Unit-Ii
34 pages
Unit 4
No ratings yet
Unit 4
51 pages
Lesson 6 Convolutional Neural Network
No ratings yet
Lesson 6 Convolutional Neural Network
43 pages
DL Unit 4 Modified
No ratings yet
DL Unit 4 Modified
64 pages
DL Unit 4
No ratings yet
DL Unit 4
58 pages
Deep Learning Image Classification
No ratings yet
Deep Learning Image Classification
11 pages
Unit Iv DL
No ratings yet
Unit Iv DL
26 pages
Intro To CNN
No ratings yet
Intro To CNN
93 pages
PNAL9 CNNs
No ratings yet
PNAL9 CNNs
61 pages
Deep Learning & CNN Fundamentals
No ratings yet
Deep Learning & CNN Fundamentals
56 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES (WWW - Jntumaterials.co - In)
26 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
11 pages
Aiml Ece Unit-5
No ratings yet
Aiml Ece Unit-5
48 pages
Unit - 4 DL
No ratings yet
Unit - 4 DL
19 pages
Chapter 4 Ann
No ratings yet
Chapter 4 Ann
33 pages
3 # Deep Learning
No ratings yet
3 # Deep Learning
36 pages
DLT Unit - 4
No ratings yet
DLT Unit - 4
36 pages
Unit - 2
No ratings yet
Unit - 2
51 pages
Module 3 Notes
No ratings yet
Module 3 Notes
22 pages
Lecture 6
No ratings yet
Lecture 6
17 pages
Unit 4a - Convolutional Neural Networks
No ratings yet
Unit 4a - Convolutional Neural Networks
107 pages
Convolutional Neuralnetworks: Abin - Roozgard
No ratings yet
Convolutional Neuralnetworks: Abin - Roozgard
54 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
47 pages
DL 4
No ratings yet
DL 4
4 pages
CNN Basics and Architecture Guide
No ratings yet
CNN Basics and Architecture Guide
16 pages
CNNs for ECE Students
No ratings yet
CNNs for ECE Students
60 pages
Unit - 5
No ratings yet
Unit - 5
47 pages
CV PPT Mt101
No ratings yet
CV PPT Mt101
16 pages
1 CNN
No ratings yet
1 CNN
14 pages
CS601 Machine Learning Unit 3
No ratings yet
CS601 Machine Learning Unit 3
47 pages
DL Unit 3 2019PAT
No ratings yet
DL Unit 3 2019PAT
66 pages
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
No ratings yet
JNTUK R20 UNIT-IV DEEP LEARNING TECHNIQUES-www - Jntumaterials.co - in
26 pages
Unit 3
No ratings yet
Unit 3
59 pages
Module11 - NNandDeep Learning
No ratings yet
Module11 - NNandDeep Learning
84 pages
Unit - 2
No ratings yet
Unit - 2
31 pages
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
No ratings yet
Ch. 10: Introduction To Convolution Neural Networks CNN and Systems
69 pages
CNN - Convolutional Neural Network
No ratings yet
CNN - Convolutional Neural Network
33 pages
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
No ratings yet
Understanding of Convolutional Neural Network (CNN) - Deep Learning - by Prabhu Raghav - Medium
10 pages
CNN, RNN
No ratings yet
CNN, RNN
60 pages
Convolutional Neural Network (CNN)
No ratings yet
Convolutional Neural Network (CNN)
9 pages
MLT UNIT-4 & 5 Imp Sol
No ratings yet
MLT UNIT-4 & 5 Imp Sol
22 pages
GCET DL Unit-3 CNN
No ratings yet
GCET DL Unit-3 CNN
114 pages
CNN Notes Unit 3 Notes
No ratings yet
CNN Notes Unit 3 Notes
17 pages
Big Data Basics for Beginners
No ratings yet
Big Data Basics for Beginners
43 pages
Verifiable and Multi-Keyword Searchable Attribute-Based Encryption Scheme For Cloud Storage
No ratings yet
Verifiable and Multi-Keyword Searchable Attribute-Based Encryption Scheme For Cloud Storage
9 pages
Ma579 HM3
No ratings yet
Ma579 HM3
6 pages
Documentation 229 FV Framework
No ratings yet
Documentation 229 FV Framework
42 pages
Activity#5-Fourier Series and Fourier Transform
No ratings yet
Activity#5-Fourier Series and Fourier Transform
3 pages
09 Domain Analysis Testing - Done
No ratings yet
09 Domain Analysis Testing - Done
14 pages
Unit-8: Computer Animation 8.1 Overview
No ratings yet
Unit-8: Computer Animation 8.1 Overview
5 pages
?dsa? Cheatsheets by Princeton - Edu
No ratings yet
?dsa? Cheatsheets by Princeton - Edu
6 pages
Optimal Control and Quadratic Optimization
No ratings yet
Optimal Control and Quadratic Optimization
23 pages
Dsa Sheet
No ratings yet
Dsa Sheet
1 page
Ex No: 7 Date: Write A Program For Implementing The FCFS Scheduling Algorithm
No ratings yet
Ex No: 7 Date: Write A Program For Implementing The FCFS Scheduling Algorithm
4 pages
(Ebook PDF) Business Forecasting 9th Edition by John E. Hanke Updated 2025
100% (2)
(Ebook PDF) Business Forecasting 9th Edition by John E. Hanke Updated 2025
110 pages
General Principles of Surveying
No ratings yet
General Principles of Surveying
8 pages
Reinforcement Learning-Based Tracking Control For A Three Mecanum Wheeled Mobile Robot
No ratings yet
Reinforcement Learning-Based Tracking Control For A Three Mecanum Wheeled Mobile Robot
8 pages
ML Mini Project PDF
No ratings yet
ML Mini Project PDF
11 pages
Decision Science-2021 Pattern
No ratings yet
Decision Science-2021 Pattern
18 pages
Unit 4
No ratings yet
Unit 4
24 pages
AI & ML in Transportation Systems
100% (1)
AI & ML in Transportation Systems
6 pages
Multivariable Calculus Exam Prep
No ratings yet
Multivariable Calculus Exam Prep
21 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
4 pages
U1L07 - Activity Guide - Apps With Storage
No ratings yet
U1L07 - Activity Guide - Apps With Storage
2 pages
Exercise 1 - Ori
No ratings yet
Exercise 1 - Ori
7 pages
Emotion Classification Using ML and DL
No ratings yet
Emotion Classification Using ML and DL
8 pages
NP and Computational Intractability
No ratings yet
NP and Computational Intractability
11 pages
Edge Adaptive Lossless Image Coding
No ratings yet
Edge Adaptive Lossless Image Coding
8 pages
DSBDAL Lab Manual
No ratings yet
DSBDAL Lab Manual
26 pages
Building Energy Use Prediction Using Time Series Analysis
No ratings yet
Building Energy Use Prediction Using Time Series Analysis
5 pages
Mtech 1 Sem Lab Practical List
No ratings yet
Mtech 1 Sem Lab Practical List
2 pages
Simulink - Dynamic System Simulation For MatLab 23
No ratings yet
Simulink - Dynamic System Simulation For MatLab 23
1 page

L4 - Deep Learning

Uploaded by

L4 - Deep Learning

Uploaded by

Deep Learning

Lecturers: Prof. Ho Tu Bao - /Prof. Dr. Tran Thi Oanh

Shallow Learning Deep Learning

➢Training time: Fast ➢Slower, requires tuning

➢Use cases: simple ➢Complex tasks (esp. vision, NLP)

Input Layer Hidden Layers Output Layer

Input Hidden Output

Stride = 1 𝐴 ∗ 𝐵: 6x6 matrix

➢CNN uses prior knowledge to reduce the

Max Pooling Property 3

• Keep either the maximum 3 1

• The feature maps is considered as composed of a voxel (3-D cubes).

➢After straightening the feature map into a 1-D vector, it is pushed

Input word … man is waking down …

➢CNN for image processing on Imdb dataset

You might also like