0% found this document useful (0 votes)

24 views27 pages

Lecture 23b Auto Encoder

Autoencoders are unsupervised neural network architectures that learn compressed representations of input data through a bottleneck structure, enabling dimensionality reduction. They consist of an encoder, a bottleneck, and a decoder, and can be trained to minimize reconstruction error without needing explicit labels. Various types of autoencoders, such as undercomplete, sparse, denoising, and contractive autoencoders, utilize different techniques to enhance feature extraction and robustness against noise or overfitting.

Uploaded by

Shahzaib Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views27 pages

Lecture 23b Auto Encoder

Uploaded by

Shahzaib Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 27

AUTOENCODERS

Dr. Asifullah Khan, DCS

Autoencoders

An unsupervised learning technique

Leverage neural networks for the task of representation learning.
A neural network architecture where a bottleneck is imposed in the network, which forces
a compressed knowledge representation of the original input.
If the input features are independent of one another then this compression and subsequent reconstruction is a
very difficult task.
If some sort of structure exists in the data (i.e.. correlations between input features), this structure can be
learned and consequently leveraged when forcing the input through the network's bottleneck.
Autoencoders
“An autoencoder is a neural network that is trained to attempt to copy its input to its output.”
— Page 502, Deep Learning, 2016.

A type of neural network, where the output layer has the same dimensionality as the input layer.

The aim of an autoencoder is to learn a lower-dimensional representation (encoding) for a higher-

dimensional data, typically for dimensionality reduction, by training the network to capture the most
important parts of the input image.
Autoencoders properties
Autoencoders are mainly a dimensionality reduction (or compression) algorithm with a couple of
important properties:
Data-specific:
only able to meaningfully compress data similar to what they have been trained on.
Lossy:
◦ The output will not be exactly the same as the input, it will be a close but a degraded representation.

Unsupervised:
◦ don’t need explicit labels to train on.
◦ more precisely they are self-supervised because they generate their own labels from the training data.
Architecture of Autoencoders

Autoencoders consist of 3 parts:

1. Encoder: compress the input into an encoded representation that is typically several orders of magnitude
smaller than the input data.
2. Bottleneck: The most important and the smallest part.
Restricts the flow of information to the decoder from the encoder
Helps to form a knowledge-representation of the input.
Prevents the neural network from memorizing the input and overfitting on the data.
Rule of thumb: The smaller the bottleneck, the lower the risk of overfitting.
3. Decoder: “decompress” the knowledge representations and reconstructs the data back from its encoded
form.
Architecture of Autoencoders
 Data set: unlabeled framed as a
supervised learning problem
 Dataset task: output , a reconstruction of
the original input x.
 Training of network: minimizing the
reconstruction error, L(x,), which measures
the differences between the original input
and the consequent reconstruction.
 Key attribute :The bottleneck

Without the presence of an information bottleneck,

the network could easily learn to simply memorize
the input values by passing these values along
through the network
Example
How to train autoencoders?

Set 4 hyperparameters before training an autoencoder:

1. Code size (size of the bottleneck ): Smaller size results in more compression. This can also act as a
regularization term.
2. Number of layers: A higher depth increases model complexity, a lower depth is faster to process.
3. Number of nodes per layer: The architecture discussed is a stacked autoencoder as the layers are
stacked one after another. Stacked autoencoders look like a “sandwitch”. The number of nodes per
layer decreases with each subsequent layer of the encoder, and increases back in the decoder.
4. Reconstruction Loss: Either use mean squared error (MSE) or binary cross-entropy. If the input values
are in the range [0, 1] then cross-entropy is used, otherwise mean squared error. With image data, the
most popular loss functions for reconstruction are MSE Loss and L1 Loss.
How to train autoencoders?

Increasing the hyperparameters will let the autoencoder to learn more complex
codings.
If autoencoder become powerful, it will simply learn to copy its inputs to the
output, without learning any meaningful representation i.e.
“mimic the identity function”
The autoencoder will reconstruct the training data perfectly, but it will
be overfitting without being able to generalize to new instances.
Ideal Autoencoder
The ideal autoencoder model balances the following:
 Sensitive to the inputs enough to accurately build a reconstruction.
 Insensitive enough to the inputs that the model doesn't simply memorize or overfit
the training data.

This trade-off forces the model to maintain only the variations in the data required to reconstruct
the input without holding on to redundancies within the input.
Trade-off: Constructing a loss
function
A loss function:

where :
Reconstruction loss L(x,) term encourages the model to be sensitive to the inputs and,
Regularizer, an added term, discourages memorization/overfitting
Types of Autoencoders

 Undercomplete Autoencoders

 Sparse Autoencoders

 Denoising Autoencoders

 Contractive Autoencoders
Undercomplete Autoencoders

• A “sandwich” architecture.
• Deliberately keep the code size
small i.e. to constrain the
number of nodes present in
the hidden layer(s) of the
network, limiting the amount
of information that can flow
through the network.
Undercomplete Autoencoders

It won’t be able to directly copy its inputs to the output

Forced to learn intelligent features.
How?
◦ By penalizing the network as per the reconstruction error.

If the input data has a pattern, for example the digit “1” usually contains a somewhat straight line
and the digit “0” is circular, it will learn this fact and encode it in a more compact form.
If the input data was completely random without any internal correlation or dependency, then an
undercomplete autoencoder won’t be able to recover it perfectly.
This encoding will learn and describe latent attributes of the input data.
Undercomplete Autoencoders vs PCA

For dimensionality reduction, PCA (Principal Component Analysis)

forms a lower-dimensional hyperplane to represent data in a higher-
dimensional form without losing information. PCA can only build
linear relationships.
Autoencoders are capable of learning nonlinear manifolds (a manifold
is defined in simple terms as a continuous, non-intersecting surface).
If we remove all non-linear activations from an undercomplete
autoencoder and use only linear layers, we reduce the undercomplete
autoencoder into something that works at an equal footing with PCA.

Torus(a nonlinear
manifold)
Undercomplete Autoencoders vs PCA

In vanilla autoencoders, i.e. autoencoders with a single hidden layer, it's common to use linear
activations for both the hidden and output layers.
Sparse autoencoders

Use regularization to force autoencoders to learn useful features

How?
Construct the loss function such that we penalize activations within a hidden layer.
This penalty, called the sparsity function, prevents the neural network from activating more
neurons and serves as a regularizer.
A different approach towards regularization, as we normally regularize the weights of a network,
not the activations.
Sparse autoencoders

Allow the network to sensitize individual hidden layer nodes toward specific attributes of the
input data i.e. have nodes in hidden layers dedicated to find specific features
Limited the network's capacity to memorize the input data without limiting the networks capability
to extract features from the data.
This allows us to consider the latent state representation and regularization of the
network separately.
This method works even if the code size is large, since only a small subset of the nodes will be
active at any time.
Sparse autoencoders

The individual nodes of a

trained model which activate
are data-dependent, different
inputs will result in activations
of different nodes through the
network.
Sparse autoencoders: Sparsity
constrain

Two main ways to impose sparsity constraint that involve measuring the hidden layer activations for
each training batch and adding some term to the loss function in order to penalize excessive
activations.
 L1 regularizarion
 KL-Divergence
L1 Regularization: add a term to the loss function that penalizes the absolute value of the vector of
activations a in layer h for observation i, scaled by a tuning parameter λ.
Sparse autoencoders: Sparsity constrain
 KL-Divergence: In essence, KL-divergence is a measure of the difference between two probability
distributions. We can define a sparsity parameter ρ which denotes the average activation of a
neuron over a collection of samples. This expectation can be calculated as

where the subscript j denotes the specific neuron in layer h, summing the activations for m
training observations denoted individually as x.

 Considering the ideal distribution as a Bernoulli distribution, we include KL divergence within the
loss to reduce the difference between the current distribution of the activations and the ideal
(Bernoulli) distribution:
Denoising autoencoders
Remove noise from an
image.
Does not have the input
image as its ground truth.
Denoising autoencoders
Adds random noise to its inputs and makes it recover the original noise-free data.
The autoencoder can’t simply copy the input to its output because the input also contains
random noise
Contractive Autoencoders

Robust to small changes in the training dataset.

Works on the basis that similar inputs should have similar encodings and a similar latent space
representation i.e. latent space should not vary by a huge amount for minor variations in the input.
Requires the derivative of the hidden layer activations be small with respect to the input.
Essentially forcing the model to learn how to contract a neighborhood of inputs into a smaller
neighborhood of outputs.
Contractive Autoencoders
Quite similar to a denoising autoencoder as

"denoising autoencoders make the reconstruction function (i.e. decoder) resist small but ﬁnite-
sized perturbations of the input, while contractive autoencoders make the feature extraction
function (ie. encoder) resist infinitesimal perturbations of the input.“
Contractive Autoencoders
A loss term which penalizes large derivatives of the hidden layer activations with respect to the input
training examples.
penalizes instances, where a small change in the input leads to a large change in the encoding space.
Regularization loss term is the squared Frobenius norm ∥A∥F of the Jacobian matrix J for the hidden
layer activations with respect to the input observations.
Loss function as:
Th an k Yo u

Experiment 4
No ratings yet
Experiment 4
26 pages
MODULE 5 Auto-Encoders and Generative Models
No ratings yet
MODULE 5 Auto-Encoders and Generative Models
25 pages
Unit 3
No ratings yet
Unit 3
23 pages
Autoencoders
No ratings yet
Autoencoders
35 pages
Unit V
No ratings yet
Unit V
32 pages
Unit IV - Part 01
No ratings yet
Unit IV - Part 01
47 pages
Unit II
No ratings yet
Unit II
35 pages
Autoencoders
No ratings yet
Autoencoders
103 pages
Chapter 7 - Autoencoders
No ratings yet
Chapter 7 - Autoencoders
91 pages
Brief Introduction On Current Research Areas - Autoencoders
No ratings yet
Brief Introduction On Current Research Areas - Autoencoders
20 pages
Autoencoders
No ratings yet
Autoencoders
4 pages
ch14 Autoencoder
No ratings yet
ch14 Autoencoder
42 pages
L23 Autoencoders
No ratings yet
L23 Autoencoders
16 pages
DL Class5
No ratings yet
DL Class5
23 pages
Lec16 - Autoencoders
No ratings yet
Lec16 - Autoencoders
18 pages
DeepLearning Unit IV Notes
No ratings yet
DeepLearning Unit IV Notes
58 pages
Unit 5
No ratings yet
Unit 5
27 pages
Autoencoder - Unit 4
No ratings yet
Autoencoder - Unit 4
39 pages
Autoencoder
No ratings yet
Autoencoder
39 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Unit4 1
No ratings yet
Unit4 1
42 pages
Lecture 14 Autoencoders
No ratings yet
Lecture 14 Autoencoders
39 pages
ML Lec 19 Autoencoder
No ratings yet
ML Lec 19 Autoencoder
54 pages
Ch3 Auto Encoder
No ratings yet
Ch3 Auto Encoder
40 pages
Deep Learning Module-2 & 4
No ratings yet
Deep Learning Module-2 & 4
48 pages
Auto Encoder
No ratings yet
Auto Encoder
39 pages
Unit-5 Auto Encoders in Deep Learning
No ratings yet
Unit-5 Auto Encoders in Deep Learning
23 pages
Chapter17 Autoencoders
No ratings yet
Chapter17 Autoencoders
23 pages
Auto Encoder
No ratings yet
Auto Encoder
10 pages
Autoencoders in Machine Learning
No ratings yet
Autoencoders in Machine Learning
7 pages
D5 PPT
No ratings yet
D5 PPT
79 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
No ratings yet
Neural Network Unsupervised Machine Learning: What Are Autoencoders?
22 pages
Understanding Sparse Autoencoders
No ratings yet
Understanding Sparse Autoencoders
8 pages
DL Unit 5
No ratings yet
DL Unit 5
19 pages
DL - Module 3
No ratings yet
DL - Module 3
62 pages
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-V Deep Learning Techniques (WWW - Jntumaterials.co - In)
61 pages
Unit5 Autoencoders
No ratings yet
Unit5 Autoencoders
45 pages
Unit 3
No ratings yet
Unit 3
39 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Module 03
No ratings yet
Module 03
13 pages
UNIT-5 Part1
No ratings yet
UNIT-5 Part1
15 pages
DL Unit 2B
No ratings yet
DL Unit 2B
23 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Unit-V DL
No ratings yet
Unit-V DL
31 pages
Autoencoders: Neural Network Guide
No ratings yet
Autoencoders: Neural Network Guide
20 pages
Vae Gan
No ratings yet
Vae Gan
214 pages
Autoencoders - Presentation
No ratings yet
Autoencoders - Presentation
18 pages
Module 3 DL
No ratings yet
Module 3 DL
103 pages
DL M3 Tech
No ratings yet
DL M3 Tech
15 pages
Autoencoders U
No ratings yet
Autoencoders U
44 pages
Autoencoders: Applications & Types
No ratings yet
Autoencoders: Applications & Types
21 pages
1 Autoencoders
No ratings yet
1 Autoencoders
22 pages
DL Module III Till IA-1
No ratings yet
DL Module III Till IA-1
15 pages
DL Unit 4
No ratings yet
DL Unit 4
21 pages
Unit 4
No ratings yet
Unit 4
10 pages
Unsupervised Deep Learning-Unit 4
No ratings yet
Unsupervised Deep Learning-Unit 4
26 pages
Lecture 6373 07
No ratings yet
Lecture 6373 07
53 pages
Lecture 8
No ratings yet
Lecture 8
19 pages
Lecture 6
No ratings yet
Lecture 6
10 pages
12 Introduction To Perceptron
No ratings yet
12 Introduction To Perceptron
53 pages
Lecture#24-Universal Turing Machine
No ratings yet
Lecture#24-Universal Turing Machine
53 pages
Poly Interp
No ratings yet
Poly Interp
27 pages
Chapt5-ER-to-Relational Mapping
No ratings yet
Chapt5-ER-to-Relational Mapping
37 pages
2) Change Control
No ratings yet
2) Change Control
4 pages
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
No ratings yet
Biochem Molecular Bio Educ - 2023 - Garma - Demystifying Dimensionality Reduction Techniques in The Omics Era A
14 pages
Harmonic Form (Worksheet)
No ratings yet
Harmonic Form (Worksheet)
2 pages
Olympic Data Analytics Project
No ratings yet
Olympic Data Analytics Project
51 pages
High-Rise Building Shear Walls
100% (4)
High-Rise Building Shear Walls
119 pages
U1L07 - Activity Guide - Apps With Storage
No ratings yet
U1L07 - Activity Guide - Apps With Storage
2 pages
The Structure of Human Hemoglobin II The Separatio
No ratings yet
The Structure of Human Hemoglobin II The Separatio
13 pages
Pendulum Energy Program Engineering
100% (2)
Pendulum Energy Program Engineering
86 pages
PYTHON Khurramshahzad
No ratings yet
PYTHON Khurramshahzad
20 pages
(03-511 To 03-0520) End Mills, HTPM, Speeds and Feeds, Slotting and Side Cutting, Metric
No ratings yet
(03-511 To 03-0520) End Mills, HTPM, Speeds and Feeds, Slotting and Side Cutting, Metric
3 pages
Power Plant Steam Turbine Issues
100% (1)
Power Plant Steam Turbine Issues
10 pages
Nutrition in Plants and Animals.
No ratings yet
Nutrition in Plants and Animals.
136 pages
HSCH 9551 Avago
No ratings yet
HSCH 9551 Avago
2 pages
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
No ratings yet
Grove 1997 VIII On The Gas Voltaic Battery Experiments Made With A View of Ascertaining The Rationale of Its Action and
23 pages
Mathmatics Demarcation - Wiskunde Afbakening
No ratings yet
Mathmatics Demarcation - Wiskunde Afbakening
4 pages
Lec 13
No ratings yet
Lec 13
18 pages
Second Internal Test - December II PUC Statistics
No ratings yet
Second Internal Test - December II PUC Statistics
2 pages
3 - Ball Mill Grinding
92% (12)
3 - Ball Mill Grinding
78 pages
MathsWatch Essential Questions SAMPLE
100% (1)
MathsWatch Essential Questions SAMPLE
16 pages
Com 101
No ratings yet
Com 101
76 pages
Srividya College of Engineering and Technology Question Bank
No ratings yet
Srividya College of Engineering and Technology Question Bank
8 pages
Concrete Durability Enhancer
No ratings yet
Concrete Durability Enhancer
2 pages
Mathematical Methods Unit 1 Exam 2tech Active
No ratings yet
Mathematical Methods Unit 1 Exam 2tech Active
17 pages
Bazaar Guide for Developers
No ratings yet
Bazaar Guide for Developers
97 pages
BBACA 2019 Pat. SEM III CA 302 Data Structure MCQ
No ratings yet
BBACA 2019 Pat. SEM III CA 302 Data Structure MCQ
22 pages
Muller Lyer Illusion
No ratings yet
Muller Lyer Illusion
8 pages
Christ - Freeze Dryer - Vacuum Conc
No ratings yet
Christ - Freeze Dryer - Vacuum Conc
16 pages
Elementary Techniques For Erdos Ko Rado
No ratings yet
Elementary Techniques For Erdos Ko Rado
10 pages
Introductory Econometrics Exam
No ratings yet
Introductory Econometrics Exam
2 pages
Mathematical Ship Modeling For Control Applications Perez Blanke
No ratings yet
Mathematical Ship Modeling For Control Applications Perez Blanke
23 pages