KEMBAR78
Unsupervised Deep Learning | PDF | Deep Learning | Principal Component Analysis
0% found this document useful (0 votes)
34 views11 pages

Unsupervised Deep Learning

Uploaded by

neeharika.sssvv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views11 pages

Unsupervised Deep Learning

Uploaded by

neeharika.sssvv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

UNSUPERVISED DEEP LEARNING

Autoencoders
Autoencoders are a neural network architecture that forces the learning of a lower
dimensional representation of data, commonly images.

Autoencoders are a type of unsupervised deep learning model that use hidden layers to
decompose and then recreate their input. They have several applications:

 Dimensionality reduction
 Preprocessing for classification
 Identifying ‘essential’ elements of the input data, and filtering out noise
One of the main motivations is find whether two pictures are similar.

The goal is to find the representation that captures the image content.

1. Anomaly detection at scale using PCA.


PCA finds dimensions that capture most
variance in data. PCA has limitations:
They are learned features are linear
combination of original features. There may
be some complex, non-linear relationship between original features (pixels) and
best lower dimensional representation. The best representation can be defined in
many different ways.

Autoencoders working:

1. Feed image through encoder network.


2. Lower dimensional embedding of our original data.
3. Embedding fed through decoder network.
4. Reconstructed version of original data
generated.
5. Compare result to the original (compute loss and train network).

Final result: Network will learn lower dimensional space representing original data
(embedding). The decoder network maps vectors from lower dimensional space to
images (This allows us means to compressing and de-compressing data).

The decoder model can be used as a generative model once it is trained (variational
autoencoders). Nevertheless, this is not often done:

 To get reasonable results, deep convolutional architecture is required.


 Generally inferior to using GANs for image generation.

Applications:

 Use autoencoders for image similarity.


o Feed two images through the encoder network, calculate the similarity
score using their latent vectors.
 Dimensionality reduction as preprocessing for classification.
 Information retrieval.
 Anomaly detection
 Machine translation
 Image related applications (like generating images, denoising, processing and
compression).
 Drug discovery
 Popularity prediction of social media posts
 Sound and music synthesis.

Autoencoders can be used in cases that are suited for Principal Component Analysis
(PCA). In PCA images need to be flattened.

Autoencoders also help to deal with some of these PCA limitations: PCA has learned
features that are linear combinations of original features.

Autoencoders can detect complex, nonlinear relationship between original features and
best lower dimensional representation.

While most autoencoders use deep layers, autoencoders are often trained with single
layer each for the encoding and decoding step. An example is sparse autoencoders,
which have been used successfully in recommendation systems.
Variational Autoencoders

Variational autoencoders also generate a latent representation and then use this
representation to generate new samples (i.e. images).

These are some important features of variational autoencoders:

 Data are assumed to be represented by a set of normally-distributed latent


factors.
 The encoder generates parameters of these distributions, namely µ and σ
i.e., mean, and standard deviation.
 Images can be generated by sampling from these distributions.
VAE goals

The main goal of VAEs: generate images using the decoder.

 Latent vector: each element drawn from normal distribution.


 Parameters learned by the encoder within this variational autoencoder,
and then fed through to our learned decoder portion to produce the
images.

The secondary goal is to have similar images be close together in latent space.

Variational autoencoders working:

These assume that the latent distribution is normally distributed, then learn to generate
images from this distribution.

1. To pass through a network with some bottleneck, so reducing the number of


nodes as we did with the regular autoencoders.
2. Image fed through encoder network.
3. These are combined into one vector. Random white noise epsilon with mean = 0
and standard deviation = 1 [N(0,1)] is added.
4. The vector is fed through decoder (now normally distributed since epsilon ~
N(0,1)
5. The reconstructed image is generated using the decoder network.
Loss Function of Variational Autoencoders

The VAE reconstruct the original images from the space of a vector drawn from a
standard normal distribution.

The two components of the loss function are:

 A penalty for not reconstructing the image correctly.


 A penalty for generating vectors of parameters µ and σ that are different than
0 and 1, respectively: the parameters of the standard normal distribution.

Variational autoencoders have a loss function with two components:

1. The pixel wise difference between reconstructed and original image.


a. Many functions (like MSE) can be used.
2. Difference between the vectors produced by the encoders and parameters of the
standard normal distribution.
Specific loss used for this part of a VAE is ‘KL Divergence’ between the
generated data and normal distribution.

Note on KL Divergence loss function:


 It is not technically necessary to include it in VAE
loss function.
 Empirically it helps to generate a latent space
(where visually similar images are close to latent
space).
If you wanted to build some complex architectures, such as Inception or ResNet you
would have to actually use functional API instead of sequential model, in order to build
out layers, such as with Inception, where you are concatenating a bunch of different
types of layers together, or ResNet where you want to bring along portions of the layer
to further layers, you will have to use something like the functional API.

Autoencoder: Input  Hidden layer/


layers  Encoding model  Decode the
encoded model with encoded inputs 
reconstructed image.
Variational Autoencoders (VAE) are
neural networks that learns
representation of data, like autoencoders.
This time however, the neural networks
will the parameters of normal distribution
that will have observations drawn from it
that will be transformed into images. This
results in 2D latent representation of data
once VAE is trained, where one
dimension represents mean and other
dimension represents Standard
deviation.
VAE: The first neural network encoder Keras backends
predicts two vectors for each image, which
will then be interpreted as mean, standard Keras is a model-level library, providing high-
deviation and transformed into normal level building blocks for developing deep
distribution. The second neural network learning models. It does not handle itself
(decoder) takes the result of this operation low-level operations such as tensor
and tries to reconstruct the original image. products, convolutions and so on. Instead, it
relies on a specialized, well-optimized tensor
The entire system is trained with
manipulation library to do so, serving as the
backpropagation. At each iteration two
"backend engine" of Keras. Rather than
losses are computed. picking one single tensor library and making
 One loss simply penalizes the the implementation of Keras tied to that
system for producing images that library, Keras handles the problem in a
modular way, and several different backend
do not match the original images.
engines can be plugged seamlessly into
 The other loss penalizes the Keras.At this time, Keras has two backend
encoder model for not correctly implementations available:
producing statistics from the image the TensorFlow backend and
that match a standard normal the Theano backend.
distribution.
Generative Adversarial Networks (GANs):
GANs are more sensitive to hyperparameters than normal neural networks.

A broader example if you were to think of trying to learn a spam filter and once a neural
net has learned what makes an email spam versus not spam, it then becomes possible
using that same network to begin designing emails that look as much as possible
like the non-spam emails that can trick our actual network. These are adversarial
examples.

Story of origin of GANs:

The invention of GANs was connected to neural networks’ vulnerability to adversarial


examples. Researchers were going to run a speech synthesis contest, to see which
neural network could generate the most realistic-sounding speech.

A neural network - the “discriminator” - would judge whether the speech was real or not.

In the end, they decided not to run the contest, because they realized people would
generate speech to fool this particular network, rather than actually generating realistic
speech.

The researchers realized that they could solve this by having the discriminator
continually improve at distinguishing between real and fake speech. They could do this
by feeding it real speech alongside fake speech.
GANs provide a way of training two neural networks simultaneously.

 One of the neural networks – the generator – learns to map random noise to
images indistinguishable from those of training set.
Looking at this image, we
start off with our generator
network and that starts with
an input which is just going
to be some random
noise. Then tries to great
image indistinguishable
from the training set
images and not the same as
any particular image, but
rather trying to find similar
properties of the image
value distributions in that
training set. Then that
produces an image which
is fed through the
discriminator and

These are the step to train GANs:

 Randomly initialize weights of generator and discriminator networks


 Randomly initialize noise vector and generate image using generator.
 Predict probability generated image is real using discriminator.
 Compute losses both assuming the image was fake one time and assuming
it was real another time.
 Train the discriminator to output whether the image is fake. We want to train
the discriminator to output 0 for generated (fake) images. We backpropagate
in relation to the loss of how far off we were to be saying that the image is
not real and update the weights of discriminator only.
 Compute the penalty for the discriminator probability, without using it to train
the discriminator.
 Train the generator to generate images that the discriminator thinks are real.
 Use the discriminator to calculate the probability that a real image is real.
 Use L1 to train the discriminator to output 1 when it sees real images. Use
this L1 loss to train and update weights appropriately within discriminator.
GANs training difficulties:
Training GANs is highly dependent on both generator and discriminator learning at the
same rate.
Ability of two networks to learn is affected by:
 Network architectures
 Learning rate
 Loss functions
 Optimization techniques
GANs are more sensitive than traditional neural networks to choices on these
dimensions.
What to do to train a GAN?
Compared with building a neural network for a supervised learning problem such as
image classification or text generation, it is more important to read original papers and
examine code on GitHub to see how researchers trained their GANs.
Famous examples of GANs include:
1. Deepfakes
2. Age interpolation
3. Text to image
Additional Deep Learning models:

Locally Interpretable Model Agnositic Explanations (LIME):


 Deep learning models are difficult to interpret:
o Many parameters, complex networks

One approach is to generate LIME:


 LIME treats the model as black box and focus instead on sensitivity of outputs to
small changes in input.
 Analogous (Comparable) to feature importance, LIME summarize the sensitivity
of regression or classification outcomes to each variable.
 Non-linearities and variable that cannot be perturbed or changed present
challenges to this approach.

You might also like