Module 3 DL
Module 3 DL
Module-3
Introduction
In both Statistics and Machine Learning, the number of attributes, features or
input variables of a dataset is referred to as its dimensionality.
For example, let’s take a very simple dataset containing 2 attributes called
Height and Weight. This is a 2-dimensional dataset and any observation of
this dataset can be plotted in a 2D plot.
Introduction
1. Encoder: A module that compresses the train-validate-test set input data into an
encoded representation that is typically several orders of magnitude smaller than the
input data.
22
Encoded data
z1
Latent space
Doepa
Encoder Bottleneck Decoder
22
Encoded data
Encoded table
Encoded rabbit
21
Latent space
Doe
Training Objectives
Input Output
Depa
Loss function
1
L(x,y) = {(r;-41)2
i=1
Initial Epoch
Why use autoencoders ?
Dimensionality Reduction
1.0 Neuron2
Neuron
Encoder
0,5
Neuron 2
Neuronl
05 10
Latent Space
1.0 Neuron2
Neuron 1
Encoder
0.5
Neuron 2
Neuronl
0.5 10
Latent Space
10 Neuron2
Neuron l
Encoder
0.5
Neuron 2
Neuronl
Doepa
0.5 10
Latent Space
10 Neuron2
7
Encoder Bottieneck
0,5
Neuronl
Doe
0.5 10
Latent Space
10 Neuron2
Encoder Bottlenek
0,5
Neuronl
Doepla
0.5 10
Latent Space
1.0 Neuron2
Enooder Bottleseck
0.5
Neuronl
Deepa
0.5 10
Latent Spасе
10 Neuron2
7
Encoder Bottleneck
0.5
Neuronl
05 L0
Latent Space
10 Neuron2
Encoder Bottleneck
0.5
Neuronl
Do
0,5 10
Latent Space
1,0 Neuron2
4 Eacoder Bottirnec
05
Neuronl
Doep
05 10
Latent Space
10 Neuron2
4 Encoder Bottleneck
0.5
Neuronl
D
0.5 10
Latent Space
10 Neuron2
4 Enroder Bottleneck
0,5
Neuronl
0.5 10
Latent Space
1.0 Neuron2
4 Eacoder Bottleseck
0.5
Neuronl
0.5 1.0
L0 Neuron2
Epoch 1
05
Neuronl
0,5 10
10 T Neuron2
Epoch 2
0.5
Neuronl
0.5 L0
10 Neuron2
Epoch
0
0.5
9
Neuronl
0,5 10
LO Neuron2
Epoch
0.5
Neuronl
0,5 10
De
10 Neuron2
Epoch 6
0
2
0,5
9
Neuronl
0,5 10
Doepa
10 Neuron2
Epoch 7
0
3
0.5
9
Neuronl
0,5 10
10 т Neuron2
Epoch 8
0.5
9
Neuronl
0,5 10
10 Neuron2
Epoch 9
0
3
0.5
9
Neuron1l
0.5 10
Doepa
10 Neuron2
Epoch 10
0
0.5
Neuronl
0,5 10
L0 Neuron2
Epoch 11
0
2
3
0.5
Neuronl
0.5 10
Doa
LO Neuron2
Epoch 12
0.5
5
7
9
Neuronl
0,5 10
10 Neuron2
Epoch 20
0
1
0.5
8
Neuronl
0.5 10
Latent vector
Latent vector
78
38 Encoder 82
Encoder
76 4
7 2 2
Latent dimension 1 Latent dimension = 1 Latent dimension = 2
9 ч
Latent dimension = 9
ч 4
Why do we care about reconstruction quality ?
0
2
4
0.5
5
Latent dimension = 5
6 Dimension 2
10
7
0
8
2 2
0,5 10
Dimension 1 Deepa
0,5
0.5 10
Dimension 1
Latent dimension = 5 Latent dimension = 5
Dimension 2 Dimension 2
10 1.0
2
3 3 2
0.5
05
8
8
05 10
0.5 10
Dimension 1
Dimension 1
Latent dimension = 5
Dimension 2
LO
0.5
5
9
8
05 10
Dimension 1 Doep
Application Example
Male
Dimension 2 Male
L0
Female
0.5
05 10
Dimension 1
Limitations
Dimension 2
Dimension 2 10
10
0
0
3
3
0.5
0.5
9 9
05 10 0.5 L0
Dimension 1 Dimension 1
Dve Doepia
Noisy Image
9
Original Image
Noisy Image
Original Image
Regularize the latent space
Variational Auto-Encoders
Training of
AE
• Number of nodes in bottleneck
Code size • Smaller size results in more compression
W
Ο
Decodes the input again from this
hidden representation
Xi
The model is trained to minimize a
xi = f(W*h+ c)
Undercomplete Autoencoder
❑ One of the simplest types of autoencoders
❑ Undercomplete autoencoder takes in an image and tries to predict the same image as output
❑ Thus reconstructing the image from the compressed bottleneck region
h = g(Wxi +b)
x; = f(W*h+c)
&; = f(W*h+c)
W
Link between PCA and Autoencoders
PCA VS Autoencoder
Regularization in Autoencoders
Ο
The simplest solution is to add a L2-
Xi regularization term to the objective
function
W*
m n
1
ΣΣ (ij xij)² + 1||0||2
2
h min -
0,w,w*,b,c m
i=1 j=1
W
Ο
This is very easy to implement and
Xi
just adds a term AW to the gradient
020)
aW (and similarly for other para-
meters)
Another trick is to tie the weights of
the encoder and decoder i.e., W*
WT
This effectively reduces the capacity
of Autoencoder and acts as a regular-
izer
Denoising Autoencoder (DAE)
▪autoencoders that remove noise from an image.
▪we feed a noisy version of the image, where noise has been added via digital
alterations.
▪ The noisy image is fed to the encoder-decoder architecture, and the output
is compared with the ground truth image.
▪ The denoising autoencoder gets rid of noise by learning a representation of
the input where the noise can be filtered out easily.
▪ While removing noise directly from the image seems difficult, the autoencoder
performs this by mapping the input data into a lower-dimensional manifold (like
in undercomplete autoencoders), where filtering of noise becomes much easier.
Denoising Autoencoder (DAE)
Denoising Autoencoder (DAE)
Topics: denoising autoencoder
Idea: representation should be OC
cobust to introduction of noise:
T
random assignment of subset of W=WТ
(tied weights)
inputs to 0, with probability
Gaussian additive noise h(x)
• Reconstruction X computed W
from the corrupted input x
Loss function comparesx
reconstruction with the noise process
pxx
noiseless input X
즈
DENOISING AUTOENCODER
p=0.2
The average value of the 1-p
= plog
activation of a neuron I is given 1-1
by
m
R(6) + 20
1
Ω(θ)
ח
i=1
大(0) =
이 (0) is the squared error loss or The function will reach its minimum value(s) when pt= p.
cross entropy loss and 2(0) is the
sparsity constraint.
Sparse Autoencoder – sparsity constraint
▪ While undercomplete autoencoders are regulated and fine-tuned by regulating
the size of the bottleneck, the sparse autoencoder is regulated by changing the
number of nodes at each hidden layer.
▪ Since it is not possible to design a neural network that has a flexible number of
nodes at its hidden layers, sparse autoencoders work by penalizing the activation
of some neurons in hidden layers.
▪This penalty, called the sparsity function, prevents the neural network from
activating more neurons and serves as a regularizer.
❑ Mathematically:
❑ The gradient is summed over all training samples, and a frobenius norm of the same is taken.
Contractive AE – penalizing derivatives
Topics: contractive autoencoder
• New loss function:
1(f(x(t))) + A|Vxh(x(®))
autoencoder lacobian of
reconstruction encoder
↑
▸where, for binary observations:
1
encoder keeps
スー
=4(Δ encoder throws Topics: contractive autoencoder
away all information encoder keeps
انے only good • Illustration: encoder doesn't need to be
information
sensitive to this variation
(not observed in training set)
2
encoder must be
sensitive to this variation
to reconstruct well
Contractive Autoencoder
1. The objective of a contractive autoencoder is to have a robust learned
representation which is less sensitive to small variation in the data.
2. Robustness of the representation for the data is done by applying a penalty
term to the loss function.
3. Contractive autoencoder is another regularization technique just like sparse
and denoising autoencoders.
4. However, this regularizer corresponds to the Frobenius norm of the Jacobian
matrix of the encoder activations with respect to the input.
5. Frobenius norm of the Jacobian matrix for the hidden layer is calculated with
respect to input and it is basically the sum of square of all elements.
Convolutional Autoencoder
1. Autoencoders in their traditional formulation does not take into account the fact that a
signal can be seen as a sum of other signals.
2. Convolutional Autoencoders use the convolution operator to exploit this observation.
3. They learn to encode the input in a set of simple signals and then try to reconstruct the
input from them, modify the geometry or the reflectance of the image.
4. They are the state-of-art tools for unsupervised learning of convolutional filters.
5. Once these filters have been learned, they can be applied to any input in order to extract
features.
6. These features, then, can be used to do any task that requires a compact representation
of the input, like classification.
Convolutional Autoencoder
Variational Autoencoder
1. Variational autoencoder models make strong assumptions concerning the distribution of
latent variables.
2. They use a variational approach for latent representation learning, which results in an
additional loss component and a specific estimator for the training algorithm called the
Stochastic Gradient Variational Bayes estimator.
3. It assumes that the data is generated by a directed graphical model and that the encoder is
learning an approximation to the posterior distribution where Ф and θ denote the
parameters of the encoder (recognition model) and decoder (generative model)
respectively.
4. The probability distribution of the latent vector of a variational autoencoder typically
matches that of the training data much closer than a standard autoencoder.
Variational Autoencoder
Applications of AE
1) Dimensionality Reduction
2) Feature Extraction
3) Image Denoising
4) Image Compression
5) Image Search
6) Anomaly Detection
7) Missing Value Imputation
Denoising: input clean image + noise and train to
reproduce the clean image.
Image colorization: input black and white and train to
produce color images
Watermark removal
Image Compression
❑ The raw input image can be passed to the encoder network and obtained a
compressed dimension of encoded data.
❑ The autoencoder network weights can be learned by reconstructing the image from
the compressed encoding using a decoder network.
Implementation of Autoencoder for Image compression
[2] import matplotlib.pyplot as plt
import numpy as ud
import pandas as pp
import tensorflow as tf
x_train,x_val=x_train[:-10000],x_train[-10000:1
x_val=x_val.astype('float32')/255.
print (x_train.shape)
print (x_test.shape)
print(x_val.shape)
个 ↓ G Π
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set visible(False)
G original original original original original original original original original original
2 0 4 ч 2
Define Auto Ecoder CLass
latent dim=64
class Autoencoder(Model):
def init (self, latent_dim):
I
super (Autoencoder, self). init ()
self.latent dim = latent_dim
self.encoder = tf.keras.Sequential )]
layers.Flatten(),
layers.Dense(latent_dim, activation='relu'),
1)
self.decoder tf.keras.Sequential([
=
layers.Dense(784, activation='sigmoid'),
layers.Reshape((28, 28))
])
return decoded
autoencoder = Autoencoder(latent_dim)
个 ↓
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError()
Epoch 1/10
1563/1563 [= ====] - 6s 3ms/step loss: 0.0267 val_loss: 0.0108
Epoch 2/10
1563/1563 ===]5s 3ms/step loss: 0.0079 -
val_loss: 0.0061
Epoch 3/10
1563/1563 [== ==]-5s 3ms/step loss: 0.0054 - val_loss: 0.0050
Epoch 4/10
1563/1563 ===]-5s 3ms/step loss: 0.0047 val_loss: 0.0046
✓
Os completed at 11:30 AM
print (autoencoder.encoder.summary())
Model: "sequential_6"
None
Model: "sequential_7"
None
[19] encoded_imgs = autoencoder.encoder(✗_test).numpyО
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs [i])
plt.title("reconstructed")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
C original original original original original original original original original original
7 2 1 0 ч 9 9
reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed
2 4 4 9 9