KEMBAR78
L4 - Deep Learning | PDF | Deep Learning | Algorithms
0% found this document useful (0 votes)
3 views50 pages

L4 - Deep Learning

Uploaded by

Thanh Thuỳ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views50 pages

L4 - Deep Learning

Uploaded by

Thanh Thuỳ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Deep Learning

Lecturers: Prof. Ho Tu Bao - /Prof. Dr. Tran Thi Oanh


Teaching Assistant: Vo Thao, Tran Lan
2
Revision

➢Describe
o Input layer, Hidden layer, Output layer, Weights, Activation Function
➢Fill in the blank

➢Visual recall
“Can you sketch a simple neural network with 2 input neurons, one hidden layer,
and two output neurons?”

3
Different types of NN

5
Types of connectivity
➢Feedforward networks output units
o These compute a series of transformations
o Typically, the first layer is the input and the hidden units
last layer is the output.
➢Recurrent networks
input units
o These have directed cycles in their
connection graph. They can have
complicated dynamics.
o More biologically realistic.

6
Distinguish

Shallow Learning Deep Learning


➢Depth (Layer): 1 (or none) ➢2 or more hidden layers
hidden layer
➢Learning capacity: Limited ➢High (can model complex data)

➢Training time: Fast ➢Slower, requires tuning

➢Use cases: simple ➢Complex tasks (esp. vision, NLP)


7
Several types of modern practical Deep NN
1. Feed forward neural network (FFNN)
2. Convolution neural network (CNN)
3. Recurrent neural network (RNN)
4. Long short term memory (LSTM)
5. Transformer (self-study)
6. GPT (self-study)

8
DeepSeek
2025

BERT
2021
MLP, LSTM
RNN 1997 LLaMA,
1986 Gemini
2023
9
Source: medium.com
Feed forward neural network

10
11
Characteristics
➢Multi-layer feed-forward networks
o One or more hidden layers. Input projects only from previous layers onto a layer.

Input Layer Hidden Layers Output Layer

Good

Bad

12
➢Multi-layer feed-forward networks

Input Hidden Output


layer layers layer

13
Convolutional Neural Network
Cited from Pei-Wei Tsai

15
CNN Overview
➢ Develop by Yan LeCun in 1989
➢ A specialized kind of neural network for
processing data that has a known grid-like
topology
o E.g: Time-series data, image data
➢CNNs are simply neural networks that use
convolution in place of general matrix
multiplication in at least one of their layers.
o Convolution – a type of linear algebra

16
Convolution Operation
➢ 1-D data convolution 1 3 0 2
= 1 × −1 + 3 × 1 = 2
-1 1 -1

1 3 0 2
1 3 0 2 -1 1 -1 = 3 × −1 + 2 × −1 = −5
-1 1 -1
𝐴: 1x4 vector 𝐵: 1x3 vector
𝐴 ∗ 𝐵: 1 3 0 2
= 2×1 = 2
= 0 -1 1 -1

1 3 0 2
1 3 0 2 = 2 × −1 = −2
-1 1 -1
= 1 × −1 = −1
-1 1 -1
∴ 𝐴 ∗ 𝐵: -1 -2 2 -5 2 -2
1 3 0 2
= 1 × 1 + 3 × −1 = −2
-1 1 -1
17
Convolution Operation (cont.)
➢ 2-D matrix convolution 𝐴 ∗ 𝐵: −1 −1 1 0 0 0
−2 0 ⋯

-1 -1 1 0 0 0
1 0 0 0
-2 0 0 -1 -1 1
1 0 0 1 -1 -1 1
0 -1 -4 -1 2 -1
0 1 1 0 -1 1 -1
0 -3 0 0 -3 0
1 0 0 1 1 -1 -1
-1 2 -1 -3 0 -1
𝐴: 4x4 matrix 𝐵: 3x3 matrix
1 -1 -1 1 -1 -1

Stride = 1 𝐴 ∗ 𝐵: 6x6 matrix

18
Why CNN is Developed?
➢Other NNs can also process image data.
However, using a typical NN to deal with
image data would cause the dramatically
increasement of coefficients.
o For example, a 100x100 colour image would
require 100x100x3 to represent the data. If the first
layer contains 1000 neurons, we will need to deal
with 30,000,000 parameters in the first layer. This
number doesn’t include the later layers but it is
already infeasible.

➢CNN uses prior knowledge to reduce the


dimension of parameters before creating the
network.
19
CNN structure overview

20
CNN Structure
Property 1
• Some patterns are much smaller than the whole image. (Localisation)

Convolution Property 2
• The same patterns appear in different regions.

Max Pooling Property 3


• Subsampling the pixels will not change the object.

Fully Connected
Convolution Feedforward Network

Output: Cat/Dog
Max Pooling

Flatten

21
https://image.slidesharecdn.com/deep-learning-lispnyc-june-2017-170625181139/95/deep-learning-7-638.jpg?cb=1498414658
Convolution

CNN - Convolution
-1 -1 1
0 1 0 0 1 0
-1 1 -1
0 1 0 0 0 1 -1 -1 -2 -3
1 -1 -1
0 1 0 0 1 0 -1 -4 -1 3
3x3 Filter 1
0 0 1 1 0 0
-1 1 -1 -3 0 0 -3
0 1 0 0 1 0
-1 1 -1
1 0 0 0 0 1 3 -1 -3 -1
-1 1 -1
6x6 image
3x3 Filter 2

Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
22
Convolution

CNN - Convolution
-1 -1 1
0 1 0 0 1 0
-1 1 -1
0 1 0 0 0 1 -1 -1 -2 -3
1 -1 -1
0 1 0 0 1 0 -1 -4 -1 3
3x3 Filter 1
0 0 1 1 0 0
-1 1 -1 -3 0 0 -3
0 1 0 0 1 0
-1 1 -1
1 0 0 0 0 1 3 -1 -3 -1
-1 1 -1
6x6 image
3x3 Filter 2
⋮ Property 2
Different location
Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
23
Convolution

CNN - Convolution
Feature Map
-1 -1 1
0 1 0 0 1 0
-1 1 -1
• The convolution process
0 1 0 0 0 1 -13 -1 -2 -31 keeps going until all filters
1 -1 -1 -3 -2
are used.
0 1 0 0 1 0
3x3 Filter 1 -11 -4
-2 -10 3-1 • The collection of the
0 0 1 1 0 0 convolution results is
-1 1 -1 -31 0-2 0-2 -31
0 1 0 0 1 0 called the Feature Map.
-1 1 -1
1 0 0 0 0 1 3-1 -1
-1 -3
-1 -1
-1
-1 1 -1
6x6 image
3x3 Filter 2
⋮ Property 2
Different location
Each filter detects a
small pattern (3x3)
Property 1
Small Pattern
24
Convolution

CNN - Convolution 1:
2:
0
1
3
3: 0
Feature Map
0 1 0 0 1 0 4: 0
1 2 3 4 5 6
5: 1
0 1 0 0 0 1
7 8 9 10 11 12 6: 0
0 1 0 0 1 0 -13 -1 -2 -31
13 14 15 16 17 18 -3 -2 7: 0
-3
0 0 1 1 0 0 8: 1
19 20 21 22 23 24 -11 -4
-2 -10 3-1
9: 0
0 1 0 0 1 0
25 26 27 28 29 30 10: 0
-31 0-2 0-2 -31
1 0 0 0 0 1 11: 0
31 32 33 34 35 36
3-1 12: 1
6x6 image -1
-1 -3
-1 -1
-1
13: 0
14: 1
• Only connect to 9 inputs
-1 1 -1 15: 0
• In fact, the feature map can but not fully connected.
16: 0
-1 1 -1 be obtained by the not fully
17: 1
-1 1 -1 connected NN structure. 18: 0
⋮ ⋮
3x3 Filter 2
25
Max Pooling
CNN – Max Pooling
Feature Map
From Filter 1 From Filter 2
-13 -1 -2 -31 -1 -1 -2 -3 3 -3 -2
-3 -2 1
-11 -4
-2 -10 3-1 -1 -4 -1 3 1 -2 0 -1
-31 0-2 0-2 -31 -3 0 0 -3 1 -2 -2 1
3-1 -1
-1 -3
-1 -1
-1 3 -1 -3 -1 -1 -1 -1 -1

• Keep either the maximum 3 1


value or the mean value to -1 3
achieve down-sampling. 1 1
3 0

26
Max Pooling
CNN – Max Pooling
• Every time when going through the Convolution plus Max Pooling, the dimension of the image is
reduced and multiple feature maps are generated.

0 1 0 0 1 0

0 1 0 0 0 1

0 1 0 0 1 0 -13 31
Convolution Max Pooling
0 0 1 1 0 0 31 01
0 1 0 0 1 0
2x2 image with
1 0 0 0 0 1
deeper channels
6x6 image

• The feature maps is considered as composed of a voxel (3-D cubes).

27
Flatten
CNN – Flatten and Fully Connected Network

➢After straightening the feature map into a 1-D vector, it is pushed


into the fully connected
-1
network for output.
3

3
-13 31
0
31
Results
01
3
2x2 image with 1
deeper channels
1

28
CNNs learn in a
hierarchical way
➢Early layers focus on simple patterns
o Filters in the first layer detect very basic features: vertical edges, horizontal
lines, gradients, dots.
o These are the simplest and most common patterns in images. Useful for all
kinds of objects.
➢Middle layers capture combinations of those patterns
o The next layers combine edges and lines to form simple shapes like: Corners,
Curves, Texture patterns (e.g., stripes, waves), Contours
o give hints about parts of objects (like eyes, noses, wheels, etc.)
➢Deeper layers recognize complex objects or high-level semantics
o At this level, the network has seen many combinations of low- and mid-level
features.
o These layers encode semantic meaning — what the image is about.
29
Video recap

30
CNN - Advantages
➢ Local Feature Detection
o CNNs excel at detecting local patterns such as edges, textures, and shapes through convolutional filters.
➢ Parameter Sharing
o The same set of filters is applied across the entire input. → significantly reduces the number of
parameters.
➢ Insensitive to rotation and shifting and scaling
o CNNs can recognize patterns regardless of their position in the input field. → making them reliable for
real-world applications where objects may appear at different locations.
➢ Hierarchical Feature Learning
o Layers build upon each other, with lower layers capturing simple features and higher layers capturing complex
patterns. → can learn a rich representation of the input data,
➢ Effective for Large-scale Data
o CNNs can handle large and high-dimensional data effectively. → suitable for real-time image and video processing,
medical imaging, and other applications with large datasets.
➢ Reduced Need for Manual Feature Extraction
o CNNs automatically learn to identify features during training. → Reduces the need for domain-specific knowledge
and manual feature engineering, making the development process faster and more accessible.
31
31
CNN - Disadvantages
➢High Computational Cost
o may not be suitable for applications with limited resources or require special hardware like GPUs.
➢Data Hungry
o CNNs generally require large amounts of labeled data to achieve good performance.
➢Sensitivity to Hyperparameters
o Performance can be highly sensitive to the choice of hyperparameters like learning rate, filter size, and
network architecture.
→ fine-tuning, which can be resource-intensive.
➢Lack of Interpretability
o difficult to interpret and understand the learned features.
➢Spatial Information Loss
o Pooling layers can cause loss of spatial information.
➢Vulnerability to Adversarial Attacks
o can be easily fooled by adversarial examples—small, intentionally crafted perturbations that cause
misclassification.
→ Raises concerns about their robustness and security, especially in critical applications. 32
32
Activities - Design Your Own 3×3 Kernel
➢Students will create custom 3×3 kernels to perform common
image-processing tasks:
o Highlight horizontal lines
o Sharpen an image
o Blur an image

33
Recurrent Neural Network (RNN)

37
Traditional ANN
➢ Inputs and outputs are independent of each
others
→ Bad idea for many tasks
If you want to predict the next word in a
sentence you better know which words came
before it

38
Analogy
➢Imagine you're reading a sentence word by word:
“The dog chased the cat through the ...”

➢To guess the next word ("park", "house", etc.), you need
to remember what you've read before. That’s what an RNN
does.

39
RNN overview
➢ Introduced by
(Rumelhart et al.,
1986a) for processing
sequential data
o Time-series, texts,
speechs
➢The idea behind RNNs
is to make use of
sequential information
Source: https://www.geeksforgeeks.org/introduction-to-recurrent-neural-network/

40
RNN
➢Recurrent neural networks
o RNNs process input one step at a time, and they
remember the past using a hidden state.
o At each word, it updates its "memory" based on the
new word and what it already remembers.
➢Have a “memory” which captures information
about what has been calculated so far

41
A RNN and the unfolding in time of the computation
involved in its forward computation

Source: https://dennybritz.com/posts/wildml/recurrent-neural-networks-tutorial-part-1/

42
Text Generation

43
Sampled Word man is waking down ?

Softmax

RNN

vectorize
vectorize
vectorize

vectorize

vectorize
Embedding

Input word … man is waking down …

44
RNN – Recurrent Neural Network

45
Advantages and disadvantages of RNN

Advantage Disadvantage
✓ Handle sequential data × Prone to vanishing and
effectively, including text, exploding gradient problems,
speech, and time series. hindering learning.
✓ Process inputs of any length, × Training can be challenging,
unlike feedforward neural especially for long sequences.
networks. × Computationally slower than
✓ Share weights across time other neural network
steps, enhancing training architectures.
efficiency.

46
RNN Limitation
1. Vanishing Gradient Problem
o In RNNs, during backpropagation through many time steps (e.g., long
sequences), the gradients often shrink to near zero.
o This means early inputs in the sequence are “forgotten” — the network can't
learn long-term dependencies.
o “I grew up in France… I speak fluent ___.”
A standard RNN might forget "France" by the time it reaches the blank, and
fail to predict “French”.

47
RNN Limitation (2)
2. Exploding Gradients
o The opposite of vanishing: sometimes gradients blow up, making training
unstable.
o This happens especially when the network is deep in time (long sequences).
3. Short-Term Memory
o Standard RNNs are only good at remembering recent inputs.
o They struggle with long-term context, even if it's crucial.
4. Training Instability
o RNNs are hard to train effectively on long sequences.
o They often get stuck or converge poorly.

48
Long-short term memory
- LSTM -

49
How LSTM Solves These Issues
➢LSTM introduces a cell state (a kind of long-term memory) and gates
to control information flow:
o Forget gate: Decides what to remove from memory.
o Input gate: Chooses what new information to add.
o Output gate: Decides what to pass forward.
➢This gating mechanism protects important information over long time
steps, and prevents gradients from vanishing or exploding.

RNNs forget too easily. LSTMs remember what matters and for how long.

50
Source: Colah ‘s blog https://colah.github.io/posts/2015-08-Understanding-LSTMs/

51
Transformer and attention

52
GPT-3
➢Introduced by Brown et al. in Language Models are Few-Shot Learners

53
Practice using Python
➢Stock Price prediction with RNN/LSTM

➢CNN for image processing on Imdb dataset

54
Thank you!
Q&A

You might also like