KEMBAR78
Unit 5e - Autoencoders | PDF | Applied Mathematics | Algorithms
0% found this document useful (0 votes)
23 views32 pages

Unit 5e - Autoencoders

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views32 pages

Unit 5e - Autoencoders

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Autoencoders

Chapter 14 from
The Deep Learning book
(Goodfellow et al)
1
Autoencoders
 Autoencoders are artificial neural networks
capable of learning efficient representations of
the input data, called codings (or latent
representation), without any supervision.
 These codings typically have a much lower
dimensionality than the input data, making
autoencoders useful for dimensionality reduction.
 Some autoencoders are generative models: they
are capable of randomly generating new data
that looks very similar to the training data.
 However, the generated images are usually fuzzy and
not entirely realistic.
2
Autoencoders
 Which of the following number sequences
do you find the easiest to memorize?
 40, 27, 25, 36, 81, 57, 10, 73, 19, 68
 50, 25, 76, 38, 19, 58, 29, 88, 44, 22, 11, 34,
17, 52, 26, 13, 40, 20

3
Autoencoders
 At first glance, it would seem that the first
sequence should be easier, since it is much
shorter.
 However, if you look carefully at the second
sequence, you may notice that it follows two
simple rules:
 Even numbers are followed by their half,

 And odd numbers are followed by their triple

plus one
 This is a famous sequence known as the
hailstone sequence.
4
Autoencoders
 Once you notice this pattern, the second
sequence becomes much easier to memorize
than the first because you only need to memorize
 the first number,

 the length of the sequence and

 the two rules.

 This leads to efficient data representation.


 Autoencoders find efficient data representations
recognizing the underlying patterns.

5
Autoencoders

6
Autoencoders formalized
 Autoencoder consists of two parts: an encoder and a
decoder
 The encoder transforms the data into a set of “factors” ,
i.e.

 A decoder decode from the encoded information, and try


to reconstruct the original data, i.e.

 The goal for the autoencoder is to minimize the difference


between the original data and the reconstructed data:

7
Autoencoders formalized

Hidden layer (code)

f g

Input Reconstruction

8
Autoencoders
 A question that comes to the mind of every beginner
of autoencoder is “isn’t it just copying data?”
 In practice, there are often constraints on the
encoder part to make sure that it will NOT lead to a
solution that just copies the data.
 For example, in practice, it may require that the

encoded information is of lower rank, so that the


encoder can be viewed as a dim. reduction step.
 While autoencoder consists of two steps, sometimes
only the output from the encoder is of interest for
downstream analysis.

9
Autoencoders

Undercomplete Overcomplete
Autoencoder Autoencoder

10
Autoencoders
 Autoencoders may be thought of as being a special
case of feedforward networks and may be trained
with all the same techniques, typically minibatch SGD.
 Unlike general feedforward networks, autoencoders
may also be trained using recirculation (Hinton and
McClelland, 1988), a learning algorithm based on
comparing the activations of the network on the
original input to the activations on the reconstructed
input.
 Recirculation is regarded as more biologically
plausible than back-propagation but is rarely used for
machine learning applications.
11
Undercomplete Autoencoders
 Learning an undercomplete representation forces
the autoencoder to capture the most salient
features of the training data.
 When the decoder is linear and L is the mean
squared error, an undercomplete autoencoder
learns to span the same subspace as PCA.
 Autoencoders with nonlinear encoder function f and
nonlinear decoder function g can thus learn a more
powerful nonlinear generalization of PCA.
 But, …

12
Undercomplete Autoencoders
 Unfortunately, if the encoder and decoder are
allowed too much capacity, the autoencoder can
learn to perform the copying task without
extracting useful information about the
distribution of the data.
 E.g. An autoencoder with a one-dimensional code
and a very powerful nonlinear encoder can learn
to map x(i) to code.
 The decoder can learn to map these integer indices
back to the values of specific training examples

13
Regularized Autoencoders
 Ideally, choose code size (dimension of h) small
and capacity of encoder f and decoder g based
on complexity of distribution modeled.
 Regularized autoencoders: Rather than limiting
model capacity by keeping encoder/decoder
shallow and code size small, we can use a loss
function that encourages the model to have
properties other than copy its input to output.
 Sparsity of the representation
 Smoothness of the derivatives
 Robustness to noise and errors in the data
14
Sparse Autoencoders
 Sparse autoencoders is a training criterion that
add a sparsity penalty to the loss function:

 An autoencoder that has been regularized to be


sparse must respond to unique statistical
features of the dataset it has been trained on,
rather than simply acting as an identity function.
 In this way, training to perform the copying task with a
sparsity penalty can yield a model that has learned
useful features as a byproduct.

15
Denoising Autoencoders (DAEs)
 In addition to add penalty terms, there are other
tricks for autoencoders to avoid copying data.
 One trick is to add some noise to the input data,
is used to denote a noisy version of the input .
 The denoising autoencoder (DAE) seeks to
minimize

 Denoising training forces f and g to implicitly learn


the structure of pdata(x)
 Another example of how useful properties can emerge
as a by -product of minimizing reconstruction error.
16
Contractive Autoencoder (CAE)
 Another strategy for regularizing an
autoencoder is to use penalty as in sparse
autoencoders L(x, g ( f (x))) + Ω(h,x)
but with a different form of
 Forces the model to learn a function that
does not change much when x changes
slightly.
 An autoencoder regularized in this way is
called a contractive autoencoder, CAE.
17
Representational Power
 Autoencoders are often trained with a single layer.
 However using a deep encoder offers many
advantages:
 They can approximate any mapping from input to code
arbitrarily well, given enough hidden units.
 They yield much better compression than
corresponding shallow autoencoders.
 Depth can exponentially reduce the computational
cost of representing some functions.
 Depth can also exponentially decrease the amount of
training data needed to learn some functions.

18
Stochastic Encoders and Decoders
 General strategy for designing the output units
and loss function of a feedforward network is to
 Define the output distribution p(y|x)

 Minimize the negative log-likelihood –log p(y|x)

 In this case y is a vector of targets such as

class labels.
 In an autoencoder x is the target as well as the
input.
 Yet we can apply the same machinery as

before.
19
Stochastic Encoders and Decoders

Hidden layer (code)

P encoder ( h | x ) P decoder ( x | h )

Input Reconstruction

20
Denoising Autoencoders (DAEs)
 Defined as an autoencoder that receives a corrupted
data point as input and is trained to predict the original,
uncorrupted data point as its output.
 Traditional autoencoders minimize L(x, g ( f (x)))
 DAE seeks to minimize .
• The autoencoder must undo this corruption rather
than simply copying their input.

Encoder Decoder

Noisy Denoised
Latent space
Input Input
representation
21
Denoising Autoencoders (DAEs)
 DAE trained to reconstruct clean data point x from the
corrupted by minimizing loss
L=-log pencoder(x|h=f(x))
 The autoencoder learns a reconstruction distribution
preconstruct(x| )) ) estimated from training pairs (x, )) as
follows:
1. Sample a training sample x from the training data
2. Sample a corrupted version from C( ~| =x)
3. Use (x, )) as a training example for estimating the
autoencoder distribution precoconstruct(x| ) =pdecoder(x|h)
with h the output of encoder f( ) and pdecoder typically
defined by a decoder g(h).
22
Denoising Autoencoders (DAEs)
 Score matching is often employed to train
DAEs.
 Score Matching encourages the model to
have the same score as the data
distribution at every training point x.
 The score is a particular gradient field: x log p(x)
 DAE estimates this score as (g(f(x)-x).
 See picture on the next slide.

23
Denoising Autoencoders (DAEs)
 Training examples x are red crosses.
 Gray circle is equiprobable corruptions.
 The vector field (g(f(x)-x), indicated by green
arrows, estimates the score x log p( x) which is the
slope of the density of data.

24
Contractive Autoencoder (CAE)
 Contractive autoencoder has an explicit
regularizer on h=f(x), encouraging the derivatives
of f to be as small as possible:
 Penalty Ω(h) is the squared Frobenius norm (sum
of squared elements) of the Jacobian matrix of
partial derivatives associated with encoder
function.

25
DAEs vs. CAEs
 DAE make the reconstruction function
resist small, finite sized perturbations in
input.
 CAE make the feature encoding function
resist small, infinitesimal perturbations in
input.
 Both denoising AE and contractive AE
perform well!
 Both are over overcomplete.
26
DAEs vs. CAEs
 Advantage of DAE: simpler to implement
 Requires adding one or two lines of code to
regular AE.
 No need to compute Jacobian of hidden layer.
 Advantage of CAE: gradient is
deterministic.
 Might be more stable than DAE, which uses a
sampled gradient.
 One less hyper-parameter to tune (noise-
factor).
27
Recurrent Autoencoders
 In a recurrent autoencoder, the encoder is
typically a sequence-to-vector RNN which
compresses the input sequence down to a
single vector.
 The decoder is a vector-to-sequence RNN
that does the reverse.

28
Convolutional autoencoders
 Convolutional neural networks are far better
suited than dense networks to work with images.
 Convolutional autoencoder: The encoder is a
regular CNN composed of convolutional layers
and pooling layers.
 It typically reduces the spatial dimensionality of the
inputs (i.e., height and width) while increasing the
depth (i.e., the number of feature maps).
 The decoder does the reverse using transpose
convolutional layers.

29
Applications of Autoencoders
 Data compression
 Dimensionality reduction
 Information retrieval
 Image denoising
 Feature extraction
 Removing watermarks from Images

30
Applications of Autoencoders
 Autoencoders have been successfully applied to
dimensionality reduction and information retrieval
tasks.
 Dimensionality reduction is one of the early
motivations for studying autoencoders.
 yielded less reconstruction error than PCA.

 If we can produce a code that is low-dimensional


and binary, then we can store all database
entries in a hash table that maps binary code
vectors to entries -- semantic hashing.

31
Chapter Summary
 Autoencoders motivated.
 Sparse autoencoders
 Denoising autoencoders
 Contractive autoencoder
 Recurrent/Convolutional autoencoders
 Applications of Autoencoders

32

You might also like