KEMBAR78
Module 3 DL | PDF | Data Compression | Algorithms
0% found this document useful (0 votes)
7 views103 pages

Module 3 DL

Uploaded by

Madhura Kanse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views103 pages

Module 3 DL

Uploaded by

Madhura Kanse
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 103

Autoencoders: Unsupervised Learning

Module-3
Introduction
In both Statistics and Machine Learning, the number of attributes, features or
input variables of a dataset is referred to as its dimensionality.

For example, let’s take a very simple dataset containing 2 attributes called
Height and Weight. This is a 2-dimensional dataset and any observation of
this dataset can be plotted in a 2D plot.
Introduction

If we add another dimension called Age to the same dataset, it becomes a


3-dimensional dataset and any observation lies in the 3-dimensional space.
Introduction
Likewise, real-world datasets have many attributes.

The observations of those datasets lie in high-dimensional space which


is hard to imagine.
Issues with high dimension data

▪decreases the overall performance of learning algorithms


▪the problem of overfitting
▪difficult for data visualization
▪issue of multicollinearity
▪noise in the data
▪unable to transform non-linear data
Dimensionality
reduction
Dimensionality reduction simply refers to
the process of reducing the number of attributes in
a dataset
while …………
keeping as much of the variation in the original
dataset as possible.
Dimensionality reduction
methods
• Subset of input variables
Feature • Ranking of variables
• Selecting most relevant features
Selection • Discarding less important features –
noise

• New features from that of existing


Feature • Reduced number of features
• Summarizes most of information
Extractio • Linear and non linear transformation
Dimensionality reduction
Data is not about adding to your plate……….
Data is about making sure you have right things on your plate
Autoencoders
▪A specific type of deep learning architecture.
▪Used for learning representation of data
▪Typically for the purpose of dimensionality reduction.
▪It learns the compressed distributed representation of data for the purpose
of dimensionality reduction.
▪An Autoencoder (AE) is an un-supervised NN used for the data compression

An autoencoder is a type of artificial neural network used to learn efficient


data codings in an unsupervised manner. The aim of an autoencoder is to learn
a representation (encoding) for a set of data, typically for dimensionality
reduction, by training the network to ignore signal “noise”.
Deep Autoencoders
Architecture of AE
Architecture of AE
Autoencoders consist of 3 parts:

1. Encoder: A module that compresses the train-validate-test set input data into an
encoded representation that is typically several orders of magnitude smaller than the
input data.

2. Bottleneck: A module that contains the compressed knowledge representations


and is therefore the most important part of the network.

3. Decoder: A module that helps the network “decompress” the knowledge


representations and reconstructs the data back from its encoded form. The output is
then compared with a ground truth.
0.11 0.78 0.11 0.78
Encoder Bottleneck Decoder
0.45 0.52 0.45 0.52

22

Encoded data

z1

Latent space
Doepa
Encoder Bottleneck Decoder

22

Encoded data

Encoded table

Encoded rabbit
21

Latent space
Doe
Training Objectives
Input Output

Encoder Bottleneck Decoder

How to compare images ?

Depa
Loss function

1
L(x,y) = {(r;-41)2
i=1
Initial Epoch
Why use autoencoders ?

Dimensionality Reduction

Encoder Bottleneck Decoder


Latent Space

1.0 Neuron2

Neuron

Encoder

0,5

Neuron 2

Neuronl

05 10
Latent Space

1.0 Neuron2

Neuron 1

Encoder

0.5

Neuron 2

Neuronl

0.5 10
Latent Space

10 Neuron2

Neuron l

Encoder

0.5

Neuron 2

Neuronl

Doepa
0.5 10
Latent Space

10 Neuron2

7
Encoder Bottieneck

0,5

Neuronl

Doe
0.5 10
Latent Space

10 Neuron2

Encoder Bottlenek

0,5

Neuronl

Doepla
0.5 10
Latent Space

1.0 Neuron2

Enooder Bottleseck

0.5

Neuronl

Deepa
0.5 10
Latent Spасе

10 Neuron2

7
Encoder Bottleneck

0.5

Neuronl

05 L0
Latent Space

10 Neuron2

Encoder Bottleneck

0.5

Neuronl

Do
0,5 10
Latent Space

1,0 Neuron2

4 Eacoder Bottirnec

05

Neuronl

Doep
05 10
Latent Space

10 Neuron2

4 Encoder Bottleneck

0.5

Neuronl

D
0.5 10
Latent Space

10 Neuron2

4 Enroder Bottleneck

0,5

Neuronl

0.5 10
Latent Space

1.0 Neuron2

4 Eacoder Bottleseck

0.5

Neuronl

0.5 1.0
L0 Neuron2

Epoch 1

05

Neuronl

0,5 10
10 T Neuron2

Epoch 2

0.5

Neuronl

0.5 L0
10 Neuron2

Epoch
0

0.5

9
Neuronl

0,5 10
LO Neuron2

Epoch

0.5

Neuronl

0,5 10

De
10 Neuron2

Epoch 6

0
2

0,5

9
Neuronl

0,5 10

Doepa
10 Neuron2

Epoch 7
0

3
0.5

9
Neuronl

0,5 10
10 т Neuron2

Epoch 8

0.5

9
Neuronl

0,5 10
10 Neuron2

Epoch 9

0
3

0.5

9
Neuron1l

0.5 10

Doepa
10 Neuron2

Epoch 10
0

0.5

Neuronl

0,5 10
L0 Neuron2

Epoch 11
0

2
3

0.5

Neuronl

0.5 10

Doa
LO Neuron2

Epoch 12

0.5

5
7

9
Neuronl

0,5 10
10 Neuron2

Epoch 20

0
1

0.5

8
Neuronl

0.5 10
Latent vector
Latent vector

78

38 Encoder 82
Encoder

76 4

Encoder Any number of neurons


Latent dimension = 1 Latent dimension 1 Latent dimension = 1

7 2 2
Latent dimension 1 Latent dimension = 1 Latent dimension = 2

9 ч

Latent dimension = 9

ч 4
Why do we care about reconstruction quality ?

Reconstruction quality = Latent space quality


Latent dimension = 2
Dimension 2
10

0
2

4
0.5
5

Latent dimension = 5
6 Dimension 2
10
7

0
8

2 2

0,5 10

Dimension 1 Deepa

0,5

0.5 10

Dimension 1
Latent dimension = 5 Latent dimension = 5
Dimension 2 Dimension 2
10 1.0

2
3 3 2

0.5
05

8
8

05 10
0.5 10

Dimension 1
Dimension 1

Latent dimension = 5
Dimension 2
LO

0.5
5

9
8

05 10

Dimension 1 Doep
Application Example
Male

Dimension 2 Male
L0
Female

0.5

05 10

Dimension 1
Limitations

Dimension 2
Dimension 2 10
10
0
0

3
3

0.5
0.5

9 9

05 10 0.5 L0

Dimension 1 Dimension 1

Dve Doepia
Noisy Image

9
Original Image

Noisy Image

Original Image
Regularize the latent space

Variational Auto-Encoders
Training of
AE
• Number of nodes in bottleneck
Code size • Smaller size results in more compression

• Number of layers in DNN


No. of layers

• Weights we use per layer


No. of nodes per layer • decreases with each subsequent layer in the
encoder
• increases with each subsequent layer in the decoder
• Reconstruction of loss
Loss function
Training of
AE
Properties of Autoencoders

Data-specific: Autoencoders are only able to compress data similar to


what they have been trained on.

Lossy: The decompressed outputs will be degraded compared to the


original inputs.

Learned automatically from examples(Unsupervised): It is easy to train


specialized instances of the algorithm that will perform well on a specific
type of input.
Types of Autoencoder
1. Linear Autoencoder
2. Undercomplete Autoencoder
3. Overcomplete Autoencoder
4. Denoising Autoencoder
5. Sparse Autoencoder
6. Contractive Autoencoder
Linear Autoencoder
▪ The simplest form of autoencoder can be trained with only a single layer
encoder and a single layer decoder.
▪It has
▪one hidden layer,
▪linear activations and
▪squared error loss function.
An autoencoder is a special type of
文 feed forward neural network which

↑ does the following


W*
Ο Encodes its input xi into a hidden
h representation h

W
Ο
Decodes the input again from this
hidden representation
Xi
The model is trained to minimize a

certain loss function which will ensure

that x; is close to x; (we will see some


h g(Wxi +b)
such loss functions soon)
=

xi = f(W*h+ c)
Undercomplete Autoencoder
❑ One of the simplest types of autoencoders
❑ Undercomplete autoencoder takes in an image and tries to predict the same image as output
❑ Thus reconstructing the image from the compressed bottleneck region

❑ The primary use is the generation of the latent space or the


bottleneck, which forms a compressed substitute of the input data and can be easily
decompressed back with the help of the network when needed.
❑ The loss function used to train an undercomplete autoencoder is called
reconstruction loss, as it is a check of how well the image has been reconstructed
from the input.
Ο Let us consider the case where

Xi dim(h) < dim(xi)


If we are still able to reconstruct i
W*
perfectly from h, then what does it
h say about h?

h is a loss-free encoding of x₁. It cap-


tures all the important characteristics
Xi of xi
Ο Do you see an analogy with PСA?

h = g(Wxi +b)
x; = f(W*h+c)

An autoencoder where dim(h) < dim(x;) is


called an under complete autoencoder
Topics: undercomplete representation
Hidden layer is undercomplete if smaller than the input layer
.

hidden layer "compresses" the input


will compress well only for the
training distribution
W"=Wт
Hidden units will be (tied weights)

good features for the h(x)


training distribution
but bad for other W
types of input
X
Overcomplete Autoencoder
▪In overcomplete autoencoder, Hidden layer has more neurons than the input.
▪It has potential to learn identity function (output equals the input) and
become useless.
▪The hidden layer is overcomplete means it is larger than input layer.
▪No compression in hidden layer.
▪Each hidden unit could copy a different input component.
▪No guarantee that the hidden unit will extract meaningful structure.
Let us consider the case when
Xi dim(h) ≥ dim(x₁)
W* In such a case the autoencoder could

learn a trivial encoding by simply


h copying x; into h and then copying
h into i
W

Such an identity encoding is useless
X
in practice as it does not really tell us
anything about the important char-
h= g(Wx; +b) acteristics of the data

&; = f(W*h+c)

An autoencoder where dim(h) ≥ dim(x;) is


called an over complete autoencoder
Topics: overcomplete representation
• Hidden layer is overcomplete if greater than the input layer
no in hidden layer
compression in
CK
⚫ each hidden unit could copy a
different input component T
W*=W
No guarantee that the (tied weights)
hidden units will extract
h(x) Ο
meaningful structure

W
Link between PCA and Autoencoders
PCA VS Autoencoder
Regularization in Autoencoders
Ο
The simplest solution is to add a L2-
Xi regularization term to the objective
function
W*
m n
1
ΣΣ (ij xij)² + 1||0||2
2
h min -
0,w,w*,b,c m
i=1 j=1
W
Ο
This is very easy to implement and
Xi
just adds a term AW to the gradient
020)
aW (and similarly for other para-
meters)
Another trick is to tie the weights of
the encoder and decoder i.e., W*
WT
This effectively reduces the capacity
of Autoencoder and acts as a regular-
izer
Denoising Autoencoder (DAE)
▪autoencoders that remove noise from an image.
▪we feed a noisy version of the image, where noise has been added via digital
alterations.
▪ The noisy image is fed to the encoder-decoder architecture, and the output
is compared with the ground truth image.
▪ The denoising autoencoder gets rid of noise by learning a representation of
the input where the noise can be filtered out easily.
▪ While removing noise directly from the image seems difficult, the autoencoder
performs this by mapping the input data into a lower-dimensional manifold (like
in undercomplete autoencoders), where filtering of noise becomes much easier.
Denoising Autoencoder (DAE)
Denoising Autoencoder (DAE)
Topics: denoising autoencoder
Idea: representation should be OC
cobust to introduction of noise:
T
random assignment of subset of W=WТ
(tied weights)
inputs to 0, with probability
Gaussian additive noise h(x)

• Reconstruction X computed W
from the corrupted input x
Loss function comparesx
reconstruction with the noise process
pxx
noiseless input X

DENOISING AUTOENCODER

x= sigm(c+W*h(x)) = sigm(c + Wh(x)) 2


p(xx) p(xx)
FILTERS (DENOISING AUTOENCODER)
No corrupted inputs (cross-entropy loss)

50% corrupted inputs

25% corrupted inputs


Sparse Autoencoder – sparsity constraint
X If the neuron I is sparse (i.c. mostly inactive)
then pi 0

A sparse autoencoder uses a sparsity para-

h meter p (typically very close to 0, say, 0.005)


and trics to enforce the constraint pt =p
Ο
One way of ensuring this is to add the follow-
X ing term to the objective function

p=0.2
The average value of the 1-p
= plog
activation of a neuron I is given 1-1
by
m

R(6) + 20
1
Ω(θ)
‫ח‬
i=1
大(0) =

When will this term reach its minimum value


and what is the minimum valuc
it and check. 0.2

이 (0) is the squared error loss or The function will reach its minimum value(s) when pt= p.
cross entropy loss and 2(0) is the
sparsity constraint.
Sparse Autoencoder – sparsity constraint
▪ While undercomplete autoencoders are regulated and fine-tuned by regulating
the size of the bottleneck, the sparse autoencoder is regulated by changing the
number of nodes at each hidden layer.
▪ Since it is not possible to design a neural network that has a flexible number of
nodes at its hidden layers, sparse autoencoders work by penalizing the activation
of some neurons in hidden layers.
▪This penalty, called the sparsity function, prevents the neural network from
activating more neurons and serves as a regularizer.

▪ drop some of hidden layers


Sparse Autoencoder
▪ While typical regularizers work by creating a penalty on the size of the
weights at the nodes, sparsity regularizer works by creating a penalty on the
number of nodes activated.
▪ This form of regularization allows the network to have nodes in hidden
layers dedicated to find specific features in images during training and
treating the regularization problem as a problem separate from the latent
space problem.
Sparse Autoencoder
1. Sparse autoencoders have hidden nodes greater than input nodes.
2. They can still discover important features from the data.
3. A generic sparse autoencoder is visualized where the obscurity of a
node corresponds with the level of activation. Sparsity constraint is
introduced on the hidden layer.
4. This is to prevent output layer copy input data.
5. Sparsity may be obtained by additional terms in the loss function
during the training process, either by comparing the probability
distribution of the hidden unit activations with some low desired
value,or by manually zeroing all but the strongest hidden unit
activations.
Sparse Autoencoder
Contractive Autoencoder – penalizing derivatives
❑ It work on the basis that similar inputs should have
similar encodings and a similar latent space representation.
It means that the latent space should not vary by a huge
amount for minor variations in the input.
❑ To train a model that works along with this constraint, we
have to ensure that the derivatives of the hidden layer
activations are small with respect to the input.

❑ Mathematically:

should be as small as possible.


Contractive Autoencoder
❑ While the reconstruction loss wants the model to tell differences between two inputs and
observe variations in the data, the frobenius norm of the derivatives says that the model
should be able to ignore variations in the input data.
❑ Putting these two contradictory conditions into one loss function enables us to train a
network where the hidden layers now capture only the most essential information.
❑ This information is necessary to separate images and ignore information that is
non- discriminatory in nature, and therefore, not important.
can be mathematically expressed
❑ The total loss function
as:

❑ The gradient is summed over all training samples, and a frobenius norm of the same is taken.
Contractive AE – penalizing derivatives
Topics: contractive autoencoder
• New loss function:

1(f(x(t))) + A|Vxh(x(®))
autoencoder lacobian of
reconstruction encoder


▸where, for binary observations:
1
encoder keeps

スー
=4(Δ encoder throws Topics: contractive autoencoder
away all information encoder keeps
‫انے‬ only good • Illustration: encoder doesn't need to be
information
sensitive to this variation
(not observed in training set)

2
encoder must be
sensitive to this variation
to reconstruct well
Contractive Autoencoder
1. The objective of a contractive autoencoder is to have a robust learned
representation which is less sensitive to small variation in the data.
2. Robustness of the representation for the data is done by applying a penalty
term to the loss function.
3. Contractive autoencoder is another regularization technique just like sparse
and denoising autoencoders.
4. However, this regularizer corresponds to the Frobenius norm of the Jacobian
matrix of the encoder activations with respect to the input.
5. Frobenius norm of the Jacobian matrix for the hidden layer is calculated with
respect to input and it is basically the sum of square of all elements.
Convolutional Autoencoder
1. Autoencoders in their traditional formulation does not take into account the fact that a
signal can be seen as a sum of other signals.
2. Convolutional Autoencoders use the convolution operator to exploit this observation.
3. They learn to encode the input in a set of simple signals and then try to reconstruct the
input from them, modify the geometry or the reflectance of the image.
4. They are the state-of-art tools for unsupervised learning of convolutional filters.
5. Once these filters have been learned, they can be applied to any input in order to extract
features.
6. These features, then, can be used to do any task that requires a compact representation
of the input, like classification.
Convolutional Autoencoder
Variational Autoencoder
1. Variational autoencoder models make strong assumptions concerning the distribution of
latent variables.
2. They use a variational approach for latent representation learning, which results in an
additional loss component and a specific estimator for the training algorithm called the
Stochastic Gradient Variational Bayes estimator.
3. It assumes that the data is generated by a directed graphical model and that the encoder is
learning an approximation to the posterior distribution where Ф and θ denote the
parameters of the encoder (recognition model) and decoder (generative model)
respectively.
4. The probability distribution of the latent vector of a variational autoencoder typically
matches that of the training data much closer than a standard autoencoder.
Variational Autoencoder
Applications of AE
1) Dimensionality Reduction
2) Feature Extraction
3) Image Denoising
4) Image Compression
5) Image Search
6) Anomaly Detection
7) Missing Value Imputation
Denoising: input clean image + noise and train to
reproduce the clean image.
Image colorization: input black and white and train to
produce color images
Watermark removal
Image Compression
❑ The raw input image can be passed to the encoder network and obtained a
compressed dimension of encoded data.
❑ The autoencoder network weights can be learned by reconstructing the image from
the compressed encoding using a decoder network.
Implementation of Autoencoder for Image compression
[2] import matplotlib.pyplot as plt
import numpy as ud
import pandas as pp
import tensorflow as tf

from sklearn.metrics import accuracy_score, precision_score, recall_score


from sklearn.model_selection import train_test_split
from tensorflow.keras import layers, losses
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Model

Load Data Set

(x_train, _), (✗_test, _) = mnist.load_data()

x_train,x_val=x_train[:-10000],x_train[-10000:1

x_train = x_train.astype('float32') / 255.


x_test x_test.astype('float32') / 255.
=

x_val=x_val.astype('float32')/255.

print (x_train.shape)
print (x_test.shape)
print(x_val.shape)

C Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz


11493376/11490434 [======= ===================]- 0s Ous/steр
11501568/11490434 (=============================]0s Ous/step
(50000, 28, 28)
(10000, 28, 28)
(10000, 28, 28)
VIsualize

个 ↓ G Π
n = 10

plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set visible(False)

G original original original original original original original original original original

2 0 4 ч 2
Define Auto Ecoder CLass

latent dim=64

class Autoencoder(Model):
def init (self, latent_dim):
I
super (Autoencoder, self). init ()
self.latent dim = latent_dim
self.encoder = tf.keras.Sequential )]
layers.Flatten(),
layers.Dense(latent_dim, activation='relu'),
1)
self.decoder tf.keras.Sequential([
=

layers.Dense(784, activation='sigmoid'),
layers.Reshape((28, 28))
])

def call(self, x):


encoded = self.encoder(x)
decoded self.decoder (encoded)
=

return decoded

autoencoder = Autoencoder(latent_dim)
个 ↓
autoencoder.compile(optimizer='adam', loss=losses.MeanSquaredError()

[18] autoencoder.fit(x_train, x_train,


epochs=10,
shuffle=True,
validation_data=(x_val, x_val))

Epoch 1/10
1563/1563 [= ====] - 6s 3ms/step loss: 0.0267 val_loss: 0.0108
Epoch 2/10
1563/1563 ===]5s 3ms/step loss: 0.0079 -
val_loss: 0.0061
Epoch 3/10
1563/1563 [== ==]-5s 3ms/step loss: 0.0054 - val_loss: 0.0050
Epoch 4/10
1563/1563 ===]-5s 3ms/step loss: 0.0047 val_loss: 0.0046

Os completed at 11:30 AM
print (autoencoder.encoder.summary())

Model: "sequential_6"

Layer (type) Output Shape Param #

flatten_3 (Flatten) (None, 784) 0


1
dense 6 (Dense) (None, 64) 50240

Total params: 50,240


Trainable params: 50,240
Non-trainable params: 0

None

[27] print (autoencoder.decoder.summary())

Model: "sequential_7"

Layer (type) Output Shape Param #

dense_7 (Dense) (None, 784) 50960

reshape_3 (Reshape) (None, 128, 28) 0

Total params: 50,960


Trainable params: 50,960
Non-trainable params: 0

None
[19] encoded_imgs = autoencoder.encoder(✗_test).numpyО
decoded_imgs = autoencoder.decoder(encoded_imgs).numpy()

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i])
plt.title("original")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs [i])
plt.title("reconstructed")
plt.gray()
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
C original original original original original original original original original original

7 2 1 0 ч 9 9
reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed reconstructed

2 4 4 9 9

You might also like