ENG6600: Advanced Machine Learning
“From Machine Learning
to Deep Learning: Part 2”
S. Areibi
School of Engineering
University of Guelph
Deep Networks: Models
Deep Learning Taxonomy
Supervised:
• Convolutional Neural Nets (CNNs) , “LeCun”
• Convolutional Encoder Decoder (CED), “Badrinarayanan”
• Recurrent Neural Nets (RNNs), “Schmidhuber”
• Long Short Term Memory (LSTMs), “J.uergen Schmihuber”
Unsupervised:
• Deep Belief Nets / Stacked RBMs, “Hinton”
• Stacked Denoising Autoencoders, “Bengio”
• Sparse AutoEncoders , “LeCun, A. Ng”
• Generative Adversarial Networks (GANs), “I. Goodfellow”
Deep Neural Networks (DNNs)
o Deep Neural Networks come in different flavors and used for different
applications. Some are supervised while others are unsupervised.
o Examples of DNNs:
1. Convolutional Neural Networks (CNNs)
2. Fully Connect Neural Networks (FCNs)
3. Recurrent Neural Networks (RNNs)
4. Long Short Term Memory (LSTM)
5. Auto Encoders (AE)
6. Convolution Encoder/Decoder (CED)
7. Generative Adversarial Networks (GANs)
8. Graph Neural Network (GNN)
• Each is used for a different application: CV, NLP, Speech Rec., ….
• Each has a different architecture: Number of Layers, types of layers, activation function
4
Deep Learning: Most Popular
Convolutional Neural Networks Long short-term memory (LSTM)
Usage: Image Classification Usage: Speech Recognition
• CNNs: is a type of ANN used in image recognition/processing specifically designed to process pixel data.
• LSTM: It is a varient of recurrent neural networks capable of learning in sequence prediction problems.
• CED: is a type of CNN used for tasks requiring dense pixel-wise predictions i.e semantic segmentation
• GANs: is a machine learning (ML) model in which two neural networks compete with each other to become
more accurate in their predictions .. Used to create and generate 3D and 2D images
Generative Adversarial Networks
Convolutional Encoder Decoder
Usage: Segmentation Usage: Generate human faces, …
Convolutional ANNs
CNNS: History
Yann LeCun, Professor of Computer Science
The Courant Institute of Mathematical Sciences
New York University
Room 1220, 715 Broadway, New York, NY 10003, USA.
(212)998-3283 yann@cs.nyu.edu
• Between 1995-1998, Yann LeCun and Yoshua Bengio introduced the
concept of Convolutional Neural Networks (CNNs).
• Convolutional Neural Networks (CNNs) have been the go-to architecture
for Image and video-based tasks like classification, localization,
segmentation, etc.
• They have shown super-human level performance in tasks that before
were considered very difficult to achieve using basic image processing
techniques.
• It has made classification tasks relatively easy to perform, without the
need for feeding hand-curated features to the model as done before
the revolution of CNN’s. 7
LeNet5
LeNet5 was developed to recognize the handwritten/machine-printed characters.
Lenet-5 is one of the earliest pre-trained models proposed by Yann LeCun
and others in the year 1998, in the research paper “Gradient-Based Learning
Applied to Document Recognition.”
CNN’s
CNN is a feed-forward network that can extract topological
properties from an image.
Consists of convolutional filters and down sampling units
(pooling) that produce feature maps that assist in extracting
features and classifying images
Like almost every other neural networks they are trained with a
version of the back-propagation algorithm.
Convolutional Neural Networks are designed to recognize visual
patterns directly from pixel images with minimal preprocessing.
They can recognize patterns with extreme variability (such as
handwritten characters).
9
Convolutional Neural Network
Maxpool
Output
Vector
Feature Extraction Architecture
Living Room
Bed Room
Kitchen
128
256
256
512
512
512
512
128
256
512
512
64
64
Bathroom
Outdoor
Feature Maps
A CNN performance depends on the architecture
Max Pool
• The number of Convolutional Layers
Fully Connected
Layers
Filter
• The number of Pooling Layers
• The size of the fully connected layer
10
Convolutional Neural Network
Alexnet had 8 layers with 61 Million learnable parameters.
VGG Net had 19 layers
GoogleNet had over 22 layers
ResNet has 152 layers and 23 million parameters Error
11
Recurrent ANNs
Recurrent Neural Networks (RNN)
• Recurrent NNs are used for modeling sequential data and data with
varying length of inputs and outputs
o Video frames, text auto completion, speech recognition, DNA sequences, ….
• RNNs introduce recurrent connections between the neurons
o This allows processing sequential data one element at a time by selectively
passing information across a sequence
o Memory of the previous inputs is stored in the model’s internal state and
affect the model predictions
o Can capture correlations in sequential data
• RNNs use backpropagation-through-time for training
• RNNs are more sensitive to the vanishing gradient problem than CNNs
Connections between nodes
can create a cycle, allowing
output from some nodes to
affect subsequent input to
the same nodes.
13
Recurrent Neural Networks (RNN)
• RNNs can have one of many inputs and one of many outputs
RNN Application Input Output
A person riding
Image a motorbike on
Captioning dirt road
Sentiment Awesome movie. Highly Positive
Analysis recommended.
Machine शभ
ु द�पावल�
Translation Happy Diwali
15
LSTM
• Long Short-Term Memory (LSTM) networks are a variant of RNNs
• LSTM mitigates the vanishing/exploding gradient problem
Solution: a Memory Cell, updated at each step in the sequence
• Three gates control the flow of information to/from the Memory Cell
Input Gate: protects the current step from irrelevant inputs
Output Gate: prevents current step from passing irrelevant information to
later steps
Forget Gate: limits information passed from one cell to the next
• Most modern RNN models use either LSTM units or other more
advanced types of recurrent units (e.g., GRU units)
16
Auto Encoders
Auto-encoder
Autoencoders are unsupervised learning methods which use a
neural network to learn a task.
An Autoencoder model consist of three parts namely
1. Encoder: compresses data into lower-dimensional representation
(latent space or code layer)
2. Code layer: is the result of using the encoder on the input.
It contains the most important features and attributes of the input.
3. Decoder: decompresses representation back to original domain.
X^i = f(W.h +b)
h = g(W.xi +b)
Xi Captures all necessary X^i
Info of the input 18
Auto-encoder
Dimensionality Reduction Image Denoising
• An Autoencoder is an unsupervised artificial neural network that
learns how to efficiently compress and encode data then learns
how to reconstruct the data back from the reduced encoded
representation to a representation that is as close to the
original input as possible.
• We call this unsupervised since we are providing x and no label y.
• In this example the input image to the autoencoder will be
reconstructed to produce the output image.
19
Encoders vs. PCA
Both perform dimensionality reduction.
PCA learns linear relationships
Encoders can learn non-linear relationships.
Encoder = PCA, if it uses linear activation functions.
20
Convolutional Enc. Dec
• A Convolutional (CNN)-based Encoder-Decoder Neural Network is an
encoder-decoder neural network that consists of an encoder neural
network and a decoder neural network in which one or both are
convolutional neural networks.
• A convolutional encoder–decoder network is used for tasks requiring
dense pixel-wise predictions like semantic segmentation
(Badrinarayanan et al., 2017), computing optical flow and disparity
maps (Mayer et al., 2016), and contour detection (Yang et al., 2016).
Semantic segmentation describes the process of associating each pixel of an
image with a class label, (such as flower, person, road, sky, ocean, or car).
21
Deep Neural Networks (DNNs)
Deep learning algorithms Description Strengths Weaknesses
More computational
Designed to correct Better for feature
Denoising time; Addition of random
corrupted input data extraction and
autoencoders noise; Less scalability to
values compression
high dimensional data
Most widely used in deep
learning applications with
Deep neural network different variations of
training strategies; Large volume of data is
structure with
Provides good required with more
CNN interconnections
hyperparameter tuning to
reflects the biological performance for multi-
dimensional data; extract optimal features
visual cortex Representational
abstract features can be
extracted from raw data
Training process is
Neural network difficult and sometimes
structure to model affected from vanishing
sequential time series gradients. More
Recurrent neural Most widely used to
data; Temporal layer is parameters have to be
network model time series data
added to learn about updated, which in turn
complex variations in makes the real time
data prediction process more
difficult
22
Transfer Learning
Transfer Learning
• Transfer learning is the application of knowledge gained from
completing one task to help solve a different, but related, problem.
• In transfer learning, a machine exploits knowledge gained from a
previous task to improve generalization on another.
• Transfer Learning is an important learning technique that can be used
to leverage pre-trained models on new similar problems.
Knowledge
Transfer
Skiing
Skating
Knowledge
Transfer
Ping Pong Tennis
Transfer Learning
• Transfer learning is a research problem in deep learning that
focuses on storing knowledge gained while solving one problem
and applying it to a different but related problem.
o For example, knowledge gained while learning to recognize cars
could apply when trying to recognize trucks.
o For example, in training a classifier to predict whether an
image contains food, you could use the knowledge it gained
during training to recognize drinks.
Transfer Learning
2. Small dataset: 3. Medium dataset:
1. Train on feature extractor finetuning
Imagenet
more data = retrain
more of the network
(or all of it)
Freeze
Freeze
these
these
Train this
Train this
Instead of creating and training a model from scratch, we can use the state
of art models which have been trained on large dataset like ImageNet
Data Augmentation
Data Augmentation
• Deep Learning requires a lot of training data because of the huge
number of Model Parameters needed to be tuned by a learning
algorithm.
Our optimization goal is to chase that sweet spot where our model’s loss
is low, which happens when your parameters are tuned in the right way.
Data Augmentation
• Data augmentation is a process of artificially increasing the
amount of data by generating new data points from existing data.
• This includes adding minor alterations to data or using machine
learning models to generate new data points in the latent space
of original data to amplify the dataset.
• Data augmentation is useful to improve performance and
outcomes of machine learning models by forming new and
different examples to train datasets.
• If the dataset in a machine learning model is rich and sufficient,
the model performs better and more accurately.
Data Augmentation
How do I get more data, if I don’t have “more data”?
1. Flip images horizontally and vertically.
2. Rotating the image by finer angles.
3. The image can be scaled outward or inward
4. We can perform cropping on the image.
5. Translation involves moving the image along the X or Y direction
6. Conditional GANs can transform an image from one domain to an
image to another domain.
• GANs: Generative Adversarial Networks
Batch Normalization
Batch Normalization
• Training deep neural networks with tens of layers is
challenging as they can be sensitive to the initial random
weights and configuration of the learning algorithm.
• One possible reason for this difficulty is the distribution of
the inputs to layers deep in the network may change after
each mini-batch when the weights are updated.
• Batch normalization is a technique for training very deep
neural networks that standardizes the inputs to a layer for
each mini-batch (re-centering and re-scaling)
• This has the effect of stabilizing the learning process and
dramatically reducing the number of training epochs required
to train deep networks (makes the DL Faster and Stable).
Batch Normalization
• Batch Norm is a normalization technique done between the layers
of a Neural Network instead of in the raw data.
• It is done along mini-batches instead of the full data set.
• It serves to speed up training and use higher learning rates,
making learning easier.
Batch normalization has a regularizing effect which
overcomes overfitting and means we can remove dropout
Deep Learning: Applications
Voice Recognition
More human interaction between user and machine
It is Everywhere
Voice Recognition
Google Voice Recognition Now Supports 119 Languages
Various methods have been applied such as convolutional neural networks
(CNNs), recurrent neural networks (RNNs), while recently Transformer
networks have achieved great performance
36
Face Recognition
It is Everywhere
Face Recognition and tagging
Facial recognition can benefit society:
• Increasing safety and security, (Protects business against theft)
• Preventing crimes and reducing human interaction.
• Helps find missing people.
• Assists Alzheimer patients to recognize their beloved relatives.
37
Automatic Colorization of Black and White Images
Deep Learning and Convolutional Neural Networks are
used to color pictures from grayscale automatically.
Optimizing Images
Using the following DNNs:
1. GANs
2. CNNs
3. CEDs
Post Processing Feature Optimization
(Color Curves and Details)
Post Processing Feature Optimization Post Processing Feature Optimization
(Illumination) (Color Tone: Warmness)
https://wordlift.io/blog/en/image-seo-using-ai/
From Deep Blue to AlphaGo
Alpha Go is a form of Deep Learning (Reinforcement Deep Learning)
• One neural network, the “policy network”, selects the next move to play.
• The other neural network, the “value network”, predicts the winner of the game.
1998 IBM Deep Blue
Now
2016
Alpha Go developed by DeepMind Technologies
which was later acquired by Google
40
Classification, Recognition and Detection
Image Classification
– assign an image into
one of many
different categories
Object Localization
- determine the location
of a single object within
an image
Object Detection Instance Segmentation
- localizing and - goes one step beyond
classifying multiple by labeling every pixel
objects within an image of an image
Object Detection
● Classification
Predict the class of
the main object of
an image
● Localization
predict the class of the
main object and also a tight
bounding box around it
● Detection
Classify and
Localize any number
Of Objects
Different CNN
Architectures used
● Segmentation For each application
label every pixel of
an image
Difficulty
Why is Object Detection Important?
● Perception is one of the biggest bottlenecks of
○ Robotics
○ Self-driving cars
○ Surveillance
Object Detection: YOLO
Image Segmentation
Image segmentation is a key building block of Computer
Vision technologies and algorithms
• Medical image analysis,
• CV for autonomous vehicles,
• Face recognition and detection,
• Video surveillance, and
Qualitative Results • Satellite image analysis.
(Cityscapes Dataset)
Recent history of Object Detection
● Large improvements using Deep Learning [Girshick’13/14]
● R-CNN: Region Based Convolutional Neural Networks
● Fast R-CCN … Fater RCNN, YOLO, …
ConvNets (2014)
58.5%
53.7% 53.3%
R-CNN
R-CNN R-CNN
SegDPM (2013) 41% 41%
Regionlets (2013)
Regionlets DPM++, Selective
(2013) DPM++ MKL,
28% 37%
Search,
23% Selective DPM++,
DPM, Search MKL
17% MKL
DPM,
DPM HOG+BOW
VOC’07 VOC’08 VOC’09 VOC’10 VOC’11 VOC’12
Rich feature hierarchies for accurate object detection and semantic segmentation. Girshick, Ross, Jeff Donahue, Trevor Darrell, and Jitendra Malik.
arXiv preprint arXiv:1311.2524 (2013).
The PASCAL Visual Object Classes Challenge - a Retrospective, Everingham, M., Eslami, S. M. A., Van Gool, L., Williams, C. K. I., Winn, J. and
Zisserman, A. Accepted for International Journal of Computer Vision, 2014
ITS
Driving Assistance - Information Support
www.seeingmachines.com
Object Detection is a crucial method that can be used for both
self-driving cars and advances driver assistance systems.
Pedestrian detection with ConvNets (video)
Autonomous Driving
• Visual Driving Assist System (Autonomy)
– An approaching vehicle or a passing pedestrian needs to be detected
with minimal latency, minimum false positives, and maximum accuracy.
CNNs Deep
RLs
Deep Learning can help the car to prepare for all the possible moves which
involve braking, halting, slowing down, changing lanes, and so on 49
49
Deep Learning Frameworks
Deep Learning Frameworks
• Several Deep Learning frameworks are available from Google, MS, FB
• DL frameworks are tools which allows us to build DL models more easily
and quickly, without getting into the details of underlying algorithms.
• One of the main factors for advancing the progress in Deep Learning is
the availability of these Frameworks
DL Development Tools
“Several tools are available from academia and industry to
develop Deep Learning based applications including:
1. Keras https://keras.io/
2. PyTorch https://pytorch.org/
3. TensorFlow https://www.tensorflow.org/
4. Caffe https://caffe.berkeleyvision.org/
5. Theano https://github.com/Theano/Theano
Google
Google
Facebook
University of Berkley University of Montreal
52
TensorFlow (2015)
TensorFlow is a free and open-source software library (developed
by Google) for machine learning and artificial intelligence.
It can be used across a range of tasks but has a particular focus
on training and inference of deep neural networks.
TensorFlow was developed by the Google Brain team for internal
Google use in research and production. The initial version was
released under the Apache License 2.0 in 2015. Google released
the updated version of TensorFlow, named TensorFlow 2.0, in
09/2019.
TensorFlow can be used in a wide variety of programming
languages, most notably Python, as well as Javascript, C++, and
Java. This flexibility lends itself to a range of applications in many
different sectors.
Its flexible architecture allows for the easy deployment of
computation across a variety of platforms (CPUs, GPUs, TPUs), and
from desktops to clusters of servers to mobile and edge devices.
53
Keras (2015)
Keras is an open-source software library that provides a Python
interface for artificial neural networks (ANNs) and acts as an
interface for the TensorFlow library (2015)
Keras was adopted and integrated into TensorFlow in mid-2017.
Users can access it via the tf.keras module. However, the Keras
library can still operate separately and independently.
Keras is designed to enable fast experimentation with deep neural
networks and focuses on being user-friendly, modular, and
extensible.
Keras contains numerous implementations of commonly used
neural-network building blocks such as layers, objectives,
activation functions, optimizers, and a host of tools to make
working with image/text data easier to simplify the coding
necessary for writing DNN code.
In addition to standard neural networks, Keras has support for
convolutional and recurrent neural networks.
It also supports other common utility layers like dropout, batch
normalization, and pooling.
54
Keras
• High-level framework for deep learning
• TensorFlow backend
• Layer types
dense
convolutional
pooling
embedding
recurrent
activation
• https://keras.io/
Keras is a high-level neural networks API, written in Python and
capable of running on top of TensorFlow, Theano or CNTK.
It is very popular in the research and development community because
it supports rapid experimentation, prototyping, and user-friendly API.
PyTorch (2016)
PyTorch is an open source machine learning library based on the
Torch library, used for applications such as computer vision and
natural language processing, primarily developed by Facebook's
AI Research lab (FAIR).
It is free and open-source software released under the
Modified BSD license. Although the Python interface is more
polished and the primary focus of development, PyTorch also has
a C++ interface.
A number of pieces of deep learning software are built on top of
PyTorch, including Tesla Autopilot, Uber's Pyro, Hugging Face's
Transformers, PyTorch Lightning, and Catalyst.
PyTorch provides two high-level features:
Tensor computing (like NumPy) with strong acceleration via
graphics processing units (GPU)
Deep neural networks built on a type-based automatic
differentiation system
56
Theano (2007)
Theano is a Python library for fast numerical computation that
can be run on the CPU or GPU. It is a key foundational library for
Deep Learning in Python that you can use directly to create Deep
Learning models or wrapper libraries that greatly simplify the
process.
Theano was developed by the Universite de Montreal in 2007 and
is a key foundational library used for deep learning in Python.
It’s considered the grandfather of deep learning frameworks
and has fallen out of favor by most researchers outside academia.
Theano used to be one of the more popular deep learning
libraries, an open-source project that lets programmers define,
evaluate, and optimize mathematical expressions, including multi-
dimensional arrays and matrix-valued expressions.
57
Caffe (2016)
Caffe (Convolutional Architecture for Fast Feature Embedding)
is a deep learning framework, originally developed at University of
California, Berkeley in 2016. It is open source, under a BSD
license.
It is written in C++, with a Python interface.
Yangqing Jia created the Caffe project during his PhD at UC
Berkeley and it is currently hosted on GitHub.
Caffe supports many different types of deep learning
architectures geared towards image classification and image
segmentation.
It supports CNN, RCNN, LSTM and fully connected neural
network designs.
Caffe supports GPU- and CPU-based acceleration computational
kernel libraries such as NVIDIA cuDNN and Intel MKL.
Caffe is being used in academic research projects, startup
prototypes, and even large-scale industrial applications in vision,
speech, and multimedia.
58
Comparison
Keras PyTorch
TensorFlow
API Level High Low High and Low
Architecture Simple, concise, readable Complex, less readable Not easy to use
Large datasets, high
Datasets Smaller datasets Large datasets, high performance
performance
Simple network, so debugging Difficult to conduct
Debugging Good debugging capabilities
is not often needed debugging
Does It Have Trained Models? Yes Yes Yes
Popularity Most popular Third most popular Second most popular
Speed Slow, low performance Fast, high-performance Fast, high-performance
Written In Python Lua C++, CUDA, Python
https://www.simplilearn.com/keras-vs-tensorflow-vs-pytorch-article
59
Summary
Summary
The “deep" in deep learning refers to the many hidden layers the neural
network accumulates over time, with performance improving as the network
gets deeper. Each level of the network processes its input data in a specific
way, which then informs the next layer. So the output from one layer becomes
the input for the next.
Deep learning is focused on improving that process of having machines learn
new things. With rule-based AI and ML, a data scientist determines the rules
and data set features to include in models, which drives how those models
operate.
With deep learning, the data scientist feeds raw data into an algorithm. The
system then analyzes that data, without specific rules or features
preprogrammed into it. Once the system makes its predictions, they are
checked against a separate set of data for accuracy. The level of accuracy of
these predictions—or lack thereof—then informs the next set of predictions
the system makes.
Deep Learning models scale better with larger amounts of data.
Deep Learning models tend to increase their accuracy with the increasing
amount of training data, while traditional Machine Learning models stop
improving after a saturation point.
Deep Learning is capable of solving more complex problems than traditional
machine learning approaches covered in this course.
Resources
Deep Learning: Resources
YouTube:
• Intro to DL
• https://www.youtube.com/watch?v=6M5VXKLf4D4
• https://www.youtube.com/watch?v=-SgkLEuhfbg
• https://www.youtube.com/watch?v=3cSjsTKtN9M
• https://www.youtube.com/watch?v=VyWAvY2CF9c
• https://www.youtube.com/watch?v=O5xeyoRL95U
• Convolutional Neural Networks
• https://www.youtube.com/watch?v=YRhxdVk_sIs
• https://www.youtube.com/watch?v=QzY57FaENXg
• https://www.youtube.com/watch?v=iaSUYvmCekI&t=266s
• Recurrent Neural Networks
• https://www.youtube.com/watch?v=c-k79rJagjQ
• https://www.youtube.com/watch?v=AsNTP8Kwu80&t=32s
• https://www.youtube.com/watch?v=YCzL96nL7j0&t=227s
• https://www.youtube.com/watch?v=Mdp5pAKNNW4
• LSTM:
• https://www.youtube.com/watch?v=YCzL96nL7j0&t=243s
• Transformers
• https://www.youtube.com/watch?v=S27pHKBEp30
• Generative Adversarial Networks
• https://www.youtube.com/watch?v=TpMIssRdhco
• https://www.youtube.com/watch?v=OXWvrRLzEaU&list=PLhhyoLH6IjfwIp8bZnzX8QR30TRcHO8Va
• Autoencoders
• https://www.youtube.com/watch?v=qiUEgSCyY5o
• https://www.youtube.com/watch?v=3jmcHZq3A5s
• https://www.youtube.com/watch?v=9zKuYvjFFS8
Deep Learning: Resources
Documents:
• https://machinelearningmastery.com/what-is-deep-learning/
• https://www.ibm.com/cloud/learn/deep-learning
• https://www.mathworks.com/discovery/deep-learning.html
• https://www.techtarget.com/searchenterpriseai/definition/deep-learning-deep-neural-network
• https://www.cnbc.com/2018/04/06/elon-musk-warns-ai-could-create-immortal-dictator-in-documentary.html
• https://medium.com/nanonets/how-to-easily-detect-objects-with-deep-learning-on-raspberrypi-225f29635c74
• Invariance & Equivariance
• https://towardsdatascience.com/translational-invariance-vs-translational-equivariance-
f9fbc8fca63a#:~:text=Translational%20Equivariance%20or%20just%20equivariance,changes%2C%20the%20output%20also%20changes.
• Batch Normalization
• https://machinelearningmastery.com/batch-normalization-for-training-of-deep-neural-networks/
Deep Learning: Resources
Tutorials:
• https://machinelearningmastery.com/tutorial-first-neural-network-python-keras/
• https://elitedatascience.com/keras-tutorial-deep-learning-in-python
Courses (MIT 6.S191) and others:
https://www.youtube.com/playlist?list=PLtBw6njQRU-rwp5__7C0oIVt26ZgjG9NI
http://introtodeeplearning.com/#schedule
https://github.com/aamini/introtodeeplearning/
Deep Learning: Resources
YouTube:
• Convolutional Encode Decoder CED & Semantic Segmentation
• https://www.youtube.com/watch?v=vD8MbfqxZR4