KEMBAR78
Autoencoders | PDF | Data Compression | Applied Mathematics
0% found this document useful (0 votes)
7 views103 pages

Autoencoders

The document provides an overview of autoencoders, a type of deep learning architecture used for unsupervised learning and dimensionality reduction. It explains the architecture of autoencoders, including the encoder, bottleneck, and decoder, and discusses various types such as linear, undercomplete, and denoising autoencoders. Additionally, it highlights applications of autoencoders in areas like image compression, feature extraction, and anomaly detection.

Uploaded by

Madhura Kanse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views103 pages

Autoencoders

The document provides an overview of autoencoders, a type of deep learning architecture used for unsupervised learning and dimensionality reduction. It explains the architecture of autoencoders, including the encoder, bottleneck, and decoder, and discusses various types such as linear, undercomplete, and denoising autoencoders. Additionally, it highlights applications of autoencoders in areas like image compression, feature extraction, and anomaly detection.

Uploaded by

Madhura Kanse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Autoencoders: Unsupervised Learning

Module-3
Introduction
In both Statistics and Machine Learning, the number of attributes, features or
input variables of a dataset is referred to as its dimensionality.

For example, let’s take a very simple dataset containing 2 attributes called
Height and Weight. This is a 2-dimensional dataset and any observation of
this dataset can be plotted in a 2D plot.
Introduction

If we add another dimension called Age to the same dataset, it becomes a


3-dimensional dataset and any observation lies in the 3-dimensional space.
Introduction
Likewise, real-world datasets have many attributes.

The observations of those datasets lie in high-dimensional space which


is hard to imagine.
Issues with high dimension data

▪decreases the overall performance of learning algorithms


▪the problem of overfitting
▪difficult for data visualization
▪issue of multicollinearity
▪noise in the data
▪unable to transform non-linear data
Dimensionality
reduction
Dimensionality reduction simply refers to
the process of reducing the number of attributes in
a dataset
while …………
keeping as much of the variation in the original
dataset as possible.
Dimensionality reduction
methods
• Subset of input variables
Feature • Ranking of variables
• Selecting most relevant features
Selection • Discarding less important features –
noise

• New features from that of existing


Feature • Reduced number of features
• Summarizes most of information
Extractio • Linear and non linear transformation
Dimensionality reduction
Data is not about adding to your plate……….
Data is about making sure you have right things on your plate
Autoencoders
▪A specific type of deep learning architecture.
▪Used for learning representation of data
▪Typically for the purpose of dimensionality reduction.
▪It learns the compressed distributed representation of data for the purpose
of dimensionality reduction.
▪An Autoencoder (AE) is an un-supervised NN used for the data compression

An autoencoder is a type of artificial neural network used to learn efficient


data codings in an unsupervised manner. The aim of an autoencoder is to learn
a representation (encoding) for a set of data, typically for dimensionality
reduction, by training the network to ignore signal “noise”.
Deep Autoencoders
Architecture of AE
Architecture of AE
Autoencoders consist of 3 parts:

1. Encoder: A module that compresses the train-validate-test set input data into an
encoded representation that is typically several orders of magnitude smaller than the
input data.

2. Bottleneck: A module that contains the compressed knowledge representations


and is therefore the most important part of the network.

3. Decoder: A module that helps the network “decompress” the knowledge


representations and reconstructs the data back from its encoded form. The output is
then compared with a ground truth.
Training Objectives
Initial Epoch
Application Example
Training of
AE
• Number of nodes in bottleneck
Code size • Smaller size results in more compression

• Number of layers in DNN


No. of layers

• Weights we use per layer


No. of nodes per layer • decreases with each subsequent layer in the
encoder
• increases with each subsequent layer in the decoder
• Reconstruction of loss
Loss function
Training of
AE
Properties of Autoencoders

Data-specific: Autoencoders are only able to compress data similar to


what they have been trained on.

Lossy: The decompressed outputs will be degraded compared to the


original inputs.

Learned automatically from examples(Unsupervised): It is easy to train


specialized instances of the algorithm that will perform well on a specific
type of input.
Types of Autoencoder
1. Linear Autoencoder
2. Undercomplete Autoencoder
3. Overcomplete Autoencoder
4. Denoising Autoencoder
5. Sparse Autoencoder
6. Contractive Autoencoder
Linear Autoencoder
▪ The simplest form of autoencoder can be trained with only a single layer
encoder and a single layer decoder.
▪It has
▪one hidden layer,
▪linear activations and
▪squared error loss function.
Undercomplete Autoencoder
❑ One of the simplest types of autoencoders
❑ Undercomplete autoencoder takes in an image and tries to predict the same image as output
❑ Thus reconstructing the image from the compressed bottleneck region

❑ The primary use is the generation of the latent space or the


bottleneck, which forms a compressed substitute of the input data and can be easily
decompressed back with the help of the network when needed.
❑ The loss function used to train an undercomplete autoencoder is called
reconstruction loss, as it is a check of how well the image has been reconstructed
from the input.
Overcomplete Autoencoder
▪In overcomplete autoencoder, Hidden layer has more neurons than the input.
▪It has potential to learn identity function (output equals the input) and
become useless.
▪The hidden layer is overcomplete means it is larger than input layer.
▪No compression in hidden layer.
▪Each hidden unit could copy a different input component.
▪No guarantee that the hidden unit will extract meaningful structure.
Link between PCA and Autoencoders
PCA VS Autoencoder
Regularization in Autoencoders
Denoising Autoencoder (DAE)
▪autoencoders that remove noise from an image.
▪we feed a noisy version of the image, where noise has been added via digital
alterations.
▪ The noisy image is fed to the encoder-decoder architecture, and the output
is compared with the ground truth image.
▪ The denoising autoencoder gets rid of noise by learning a representation of
the input where the noise can be filtered out easily.
▪ While removing noise directly from the image seems difficult, the autoencoder
performs this by mapping the input data into a lower-dimensional manifold (like
in undercomplete autoencoders), where filtering of noise becomes much easier.
Denoising Autoencoder (DAE)
Denoising Autoencoder (DAE)
Sparse Autoencoder – sparsity constraint
Sparse Autoencoder – sparsity constraint
▪ While undercomplete autoencoders are regulated and fine-tuned by regulating
the size of the bottleneck, the sparse autoencoder is regulated by changing the
number of nodes at each hidden layer.
▪ Since it is not possible to design a neural network that has a flexible number of
nodes at its hidden layers, sparse autoencoders work by penalizing the activation
of some neurons in hidden layers.
▪This penalty, called the sparsity function, prevents the neural network from
activating more neurons and serves as a regularizer.

▪ drop some of hidden layers


Sparse Autoencoder
▪ While typical regularizers work by creating a penalty on the size of the
weights at the nodes, sparsity regularizer works by creating a penalty on the
number of nodes activated.
▪ This form of regularization allows the network to have nodes in hidden
layers dedicated to find specific features in images during training and
treating the regularization problem as a problem separate from the latent
space problem.
Sparse Autoencoder
1. Sparse autoencoders have hidden nodes greater than input nodes.
2. They can still discover important features from the data.
3. A generic sparse autoencoder is visualized where the obscurity of a
node corresponds with the level of activation. Sparsity constraint is
introduced on the hidden layer.
4. This is to prevent output layer copy input data.
5. Sparsity may be obtained by additional terms in the loss function
during the training process, either by comparing the probability
distribution of the hidden unit activations with some low desired
value,or by manually zeroing all but the strongest hidden unit
activations.
Sparse Autoencoder
Contractive Autoencoder – penalizing derivatives
❑ It work on the basis that similar inputs should have
similar encodings and a similar latent space representation.
It means that the latent space should not vary by a huge
amount for minor variations in the input.
❑ To train a model that works along with this constraint, we
have to ensure that the derivatives of the hidden layer
activations are small with respect to the input.

❑ Mathematically:

should be as small as possible.


Contractive Autoencoder
❑ While the reconstruction loss wants the model to tell differences between two inputs and
observe variations in the data, the frobenius norm of the derivatives says that the model
should be able to ignore variations in the input data.
❑ Putting these two contradictory conditions into one loss function enables us to train a
network where the hidden layers now capture only the most essential information.
❑ This information is necessary to separate images and ignore information that is
non- discriminatory in nature, and therefore, not important.
can be mathematically expressed
❑ The total loss function
as:

❑ The gradient is summed over all training samples, and a frobenius norm of the same is taken.
Contractive AE – penalizing derivatives
Contractive Autoencoder
1. The objective of a contractive autoencoder is to have a robust learned
representation which is less sensitive to small variation in the data.
2. Robustness of the representation for the data is done by applying a penalty
term to the loss function.
3. Contractive autoencoder is another regularization technique just like sparse
and denoising autoencoders.
4. However, this regularizer corresponds to the Frobenius norm of the Jacobian
matrix of the encoder activations with respect to the input.
5. Frobenius norm of the Jacobian matrix for the hidden layer is calculated with
respect to input and it is basically the sum of square of all elements.
Convolutional Autoencoder
1. Autoencoders in their traditional formulation does not take into account the fact that a
signal can be seen as a sum of other signals.
2. Convolutional Autoencoders use the convolution operator to exploit this observation.
3. They learn to encode the input in a set of simple signals and then try to reconstruct the
input from them, modify the geometry or the reflectance of the image.
4. They are the state-of-art tools for unsupervised learning of convolutional filters.
5. Once these filters have been learned, they can be applied to any input in order to extract
features.
6. These features, then, can be used to do any task that requires a compact representation
of the input, like classification.
Convolutional Autoencoder
Variational Autoencoder
1. Variational autoencoder models make strong assumptions concerning the distribution of
latent variables.
2. They use a variational approach for latent representation learning, which results in an
additional loss component and a specific estimator for the training algorithm called the
Stochastic Gradient Variational Bayes estimator.
3. It assumes that the data is generated by a directed graphical model and that the encoder is
learning an approximation to the posterior distribution where Ф and θ denote the
parameters of the encoder (recognition model) and decoder (generative model)
respectively.
4. The probability distribution of the latent vector of a variational autoencoder typically
matches that of the training data much closer than a standard autoencoder.
Variational Autoencoder
Applications of AE
1) Dimensionality Reduction
2) Feature Extraction
3) Image Denoising
4) Image Compression
5) Image Search
6) Anomaly Detection
7) Missing Value Imputation
Denoising: input clean image + noise and train to
reproduce the clean image.
Image colorization: input black and white and train to
produce color images
Watermark removal
Image Compression
❑ The raw input image can be passed to the encoder network and obtained a
compressed dimension of encoded data.
❑ The autoencoder network weights can be learned by reconstructing the image from
the compressed encoding using a decoder network.
Implementation of Autoencoder for Image compression

You might also like