KEMBAR78
Unit-V Deep Generative Models Part-01 | PDF | Artificial Neural Network | Computational Neuroscience
0% found this document useful (0 votes)
73 views41 pages

Unit-V Deep Generative Models Part-01

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views41 pages

Unit-V Deep Generative Models Part-01

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Deep Generative Models

• Deep Generative Models : Boltzmann Machines, Deep


Belief networks, Deep Boltzmann Machine, Generative
Stochastic Networks, Generative Adversarial
networks, evaluating Generative Models Networks.
Introduction
• A Generative Model is a powerful way of
learning any kind of data distribution using
unsupervised learning and has achieved
tremendous success.

• All types of generative models aim at learning


the true data distribution of the training set so
as to generate new data points with some variations.

• Deep generative models (DGMs) are neural networks


with many hidden layers trained to approximate
complicated, high-dimensional probability
Introduction
• These models have gained significant attention in
recent years due to their ability to learn complex data
distributions and generate new samples from those
distributions.

• When trained successfully, we can use DGMs to estimate


the likelihood of each observation and create new samples
from the underlying distribution.

• The two most popular approaches for deep generative


modeling are:

1. Variational Autoencoders (VAE)

2. Generative Adversarial Networks (GAN).


Introduction
1. Variational Autoencoders (VAE):

VAEs are probabilistic graphical models rooted in


Bayesian inference. VAEs aim to learn a low-
dimensional latent representation of training data,
which can be used to generate new data points.

VAEs combine an encoder and a decoder network.

The encoder maps input data to a latent space, and the


decoder generates samples from this latent space.

VAEs are commonly used for generative tasks and


representation learning.
Introduction
2. Generative Adversarial Networks (GAN): GANs
consist of a generator and a discriminator.

• The generator generates data samples, while the


discriminator evaluates whether a given sample is
real or generated.

• The training process involves an adversarial game


between the generator and discriminator, leading to
the generator learning to produce realistic data
samples.
Differentiate between deterministic
and probabilistic neural networks
Feature Deterministic Neural Probabilistic Neural
Networks Networks
Output Type Fixed values Probability distributions
Weight Nature Fixed weights after Probabilistic weights
training
Prediction Consistent for the same
Consistency input Varies for the same input
Uncertainty Not available Provides uncertainty
Estimation estimation
Implementation Simpler More complex
Example Models FNNs, CNNs, RNNs BNNs, VAEs, Dropout-
based models
Use Cases General prediction and Tasks requiring
classification uncertainty handling
Boltzmann machine
• Boltzmann machine is designed to learn
probability distributions over its set of inputs.

• There are three key concepts to know about


Boltzmann machine:

1. Stochasticity : Unlike traditional deterministic


neural networks, Boltzmann machines incorporate
randomness.

• The state of each neuron (node) in the network is


determined probabilistically based on the states of
the neighboring neurons and a temperature
Boltzmann machine
2. Energy Function: The Boltzmann machine assigns an
energy to each possible state of the system.

• Lower energy states are more probable. The energy


function typically involves weights between nodes and
biases.

3. Equilibrium: The machine aims to reach a thermal


equilibrium where the distribution of states follows the
Boltzmann distribution.

• This distribution specifies that the probability of a system


being in a certain state decreases exponentially with the
Boltzmann machine
• A Boltzmann machine is essentially a fully connected,
two-layer neural network.

• These two layers represents as the visual and


hidden layers.

• The visual layer is analogous to the input layer in


feedforward neural networks.

• A Boltzmann machine has a hidden layer, it functions


more as an output layer.

• The Boltzmann machine has no hidden layer between


the input and output layers.
Boltzmann machine
• The basic units of a Boltzmann machine are binary
neurons that can be in one of two states: on (1) or off (0).

• There are two types of units in a Boltzmann machine:

• Visible units: Correspond to the input data.

• Hidden units: Capture dependencies and abstract


features that are not directly observed.

• Weights: Represent connections between pairs of units.


These can be symmetric (i.e., the weight from unit i to
unit j is the same as from unit j to unit i).

• Biases: Represent the threshold for each unit.


Boltzmann machine
• Figure below shows the very simple structure of a
Boltzmann machine:

• The above Boltzmann machine has three hidden neurons


and four visible neurons.

• A Boltzmann machine is fully connected because every


neuron has a connection to every other neurons. However,
no neuron is connected to itself.
Boltzmann machine
• Types of Boltzmann Machines

1. Restricted Boltzmann Machines (RBMs): A simplified


version of the Boltzmann machine where the network is
restricted to a bipartite graph, meaning there are no
connections within the visible units or the hidden units.

• The Figure below shows as an RBM which is not fully


connected. All hidden neurons are connected to each
visible neuron.
Boltzmann machine
• There are no connections among the hidden neurons nor
there are connections among the visible neurons.

2. Deep Belief Networks (DBNs): Composed of multiple


layers of RBMs. These networks can learn hierarchical
representations of the data.

3. Deep Boltzmann Machines (DBMs): A Deep Boltzmann


Machine (DBM) is an advanced type of Boltzmann machine
designed to model complex, high-dimensional data.

It extends the idea of a Restricted Boltzmann Machine (RBM)


by stacking multiple layers of hidden units, creating a deep
architecture that can capture intricate patterns and
dependencies in data.
Restricted Boltzmann machine (RBM)
• A Restricted Boltzmann Machine (RBM) is a simplified
version of a Boltzmann machine with certain
restrictions that make it easier to train and more
practical for many applications.

• Structure of Restricted Boltzmann Machines

A Restricted Boltzmann Machine


(RBM) is a generative,
stochastic, and 2-layer artificial
neural network that can learn a
probability distribution over its
set of inputs.
Restricted Boltzmann machine (RBM)
• Visible Units (V): These units represent the input
data. The number of visible units corresponds to the
number of features in the input data.

• Hidden Units (H): These units capture the


dependencies and patterns in the input data. The
number of hidden units is a hyper-parameter that
can be tuned.

• Weights (W): Each visible unit is connected to


every hidden unit with a symmetric weight. The
weight matrix W defines these connections.
Restricted Boltzmann machine (RBM)
• Biases: There are bias terms for both visible units ( 𝑎)
and hidden units (𝑏). These biases help in adjusting the
activation thresholds of the units.

• The restriction in a Restricted Boltzmann Machine is


that there is no intra-layer communication(nodes of
the same layer are not connected).

• Visible units are not connected to other visible units,


and hidden units are not connected to other hidden
units.

• This restriction allows for more efficient training


algorithm in the class of Boltzmann machines.
Restricted Boltzmann machine (RBM)
• Energy function in RBM

• The energy of a configuration (a state of visible and hidden


units) in an RBM is defined as:

• 𝐸(𝑣,ℎ) = − ∑ 𝑖 𝑎𝑖 𝑣𝑖 − ∑𝑗 𝑏𝑗 ℎ𝑗 − ∑𝑖,𝑗 𝑣𝑖 𝑊𝑖𝑗 ℎ𝑗

• where:

• 𝑣𝑖​is the state of visible unit 𝑖,

• ℎ𝑗​is the state of hidden unit j,

• 𝑎𝑖​is the bias of visible unit 𝑖,

• 𝑏𝑗​is the bias of hidden unit j,

• 𝑊𝑖𝑗​is the weight between visible unit 𝑖 and hidden unit j.


Restricted Boltzmann machine (RBM)
• Probabilistic Activation

• The states of the units are binary (0 or 1) and are


activated probabilistically based on their energies.

• The probability that a hidden unit ℎ𝑗​is activated (i.e.,


set to 1) given the visible units 𝑣 is:

• P ( hj​= 1∣v ) = σ ( bj ​+ ∑i​vi​Wij ​)

• Similarly, the probability that a visible unit 𝑣𝑖​ is


activated given the hidden units ℎ is:

• P ( vi ​= 1∣h ) = σ ( ai ​+ ∑j ​hj ​Wij​)


Restricted Boltzmann machine (RBM)
• where 𝜎(𝑥) is the logistic sigmoid function:

• σ (x)=1 / 1+e−x1​
Training RBMs

• Training an RBM involves adjusting the weights and


biases to minimize the difference between the
observed data distribution and the distribution
modeled by the RBM.

• The primary algorithm used for this purpose is


Contrastive Divergence (CD).
Restricted Boltzmann machine (RBM)
• Working of Restricted Boltzmann Machine

• RBM works in two biases

• The hidden bias helps the RBM produce the


activations on the forward pass, while

• The visible layer’s biases help the RBM learn the


reconstructions on the backward pass.

• Forward pass

• The following Figure shows the working of RBM in


forward pass.
Restricted Boltzmann machine (RBM)

• The forward pass is the first step in training an RBM with

multiple inputs.

• The inputs are multiplied by the weights and then added to


Restricted Boltzmann machine (RBM)

• The result is then passed through a sigmoid


activation function and the output determines if the
hidden state gets activated or not.

• Weights will be a matrix with the number of input


nodes as the number of rows and the number of
hidden nodes as the number of columns.

• The first hidden node will receive the vector


multiplication of the inputs multiplied by the first
column of weights before the corresponding bias term
is added to it.
Restricted Boltzmann machine (RBM)
• The sigmoid function is given by:

• So the equation that we get in this step would be,

• where h(1) and v(0) are the corresponding vectors


(column matrices) for the hidden and the visible
layers with the superscript as the iteration (v(0)
means the input that we provide to the network) and
a is the hidden layer bias vector.
Restricted Boltzmann machine (RBM)
• Backward pass

• The backward pass is the reverse or the


reconstruction phase.

• It is similar to the first pass but in the opposite


direction as shown below:
Restricted Boltzmann machine (RBM)

• Where v(1) and h(1) are the corresponding vectors


(column matrices) for the visible and the hidden layers
with the superscript as the iteration and a is the
visible layer bias vector.
Applications of RBM
• RBMs have been used in various applications,
including:

1. Dimensionality Reduction: Learning compact


representations of data.

2. Feature Learning: Extracting useful features from


raw data.

3. Collaborative Filtering: Building


recommendation systems.

4. Pre-training Deep Networks: Initializing the


weights of deep networks in a layer-wise manner.
Deep Belief Neural Networks
• A Restricted Boltzmann Machine (RBM) is a type of
generative stochastic artificial neural network that
can learn a probability distribution from its inputs.

• Deep belief networks, in particular, can be created


by “stacking” RBMs and fine-tuning the resulting
deep network via gradient descent and
backpropagation.

• DBF belong to the family of unsupervised learning


algorithms and are known for their ability to learn
hierarchical representations from data.
Deep Belief Neural Networks
• DBN vary in operation, unlike autoencoders and
RBMs work with raw input data whereas DBN operate
on an input layer with one neuron for each input
vector and go through numerous levels before
arriving at the final layer.

• The final outputs are produced using probabilities


acquired from earlier layers.
Deep Belief Neural Networks
• The Architecture of DBN

• The top two layers are the associative memory, and


the bottom layer is the visible units.

• The arrows pointing towards the layer closest to the


data point showing the relationships between all
lower layers.
Deep Belief Neural Networks
• Directed acyclic connections in the lower layers
translate associative memory to observable
variables.

• The lowest layer of visible units receives input data


as binary or actual data.

• Like RBM, there are no intralayer connections in DBN.

• The hidden units represent features that encapsulate


the data’s correlations.

• A matrix of proportional weights W connects two


layers.
Deep Belief Neural Networks

• The “Input Layer” represents the initial layer, which has one
neuron for each input vector.

• “Hidden Layer 1” is the first layer of Restricted Boltzmann


Machine (RBM), which learns the fundamental structure of the
Deep Belief Neural Networks
• “Hidden Layer 2” and subsequent layers are additional
RBMs that learn higher-level features as we move through
the network.

• We can have multiple hidden layers depending on the


complexity of the task.

• “Output Layer” is used for supervised learning tasks like


classification or regression.

• The arrows indicate the flow of information from one layer


to the next, and the connections between neurons in
adjacent layers represent the weights that are learned
Deep Belief Neural Networks
• Training the RBMs:

• One of the unique aspects of DBNs is that each RBM


is trained independently using a technique
called contrastive divergence.

• This method allows us to approximate the gradient


of the log-likelihood of the data with respect to the
RBM’s parameters.

• After training, the output of one RBM becomes


the input for the next, creating a stacked
structure of RBMs.
Deep Belief Neural Networks
• Fine-Tuning for Supervised Learning:

• After the DBN has been assembled through the training of


its RBMs, it can be fine-tuned for supervised learning tasks.

• This fine-tuning process entails adjusting the


weights of the final layer using supervised learning
techniques like backpropagation.

• DBNs have gained popularity for their impressive


performance across various applications.

• From image and speech recognition to natural


language processing, they have consistently
delivered state-of-the-art results.
Deep Belief Neural Networks
• One of the main advantages of DBNs is their ability to
learn features from the data in an unsupervised manner.

• 1. A hierarchical representation of the data can also be


learned by DBNs, with each layer learning increasingly
sophisticated features from lower layers to higher layers.

• 2. DBNs have proven to be resistant to overfitting issue


due to model regularisation and by just using a small
amount of labelled data during the fine-tuning phase.

• 3. The capacity of DBNs to manage missing data that


happens frequently in many real-world applications for
some data to be corrupted or absent.
Deep Boltzmann Machine
• A Deep Boltzmann Machine (DBM) is a type of
generative stochastic neural network that is used in
deep learning to model complex distributions over
high-dimensional data.

• It is an extension of the Boltzmann Machine and


Restricted Boltzmann Machine (RBM) that introduces
multiple layers of hidden units, allowing it to capture
intricate patterns and dependencies in the data.

• A DBM analyzes data and learns how to produce new


examples that are similar to the original data.
Architecture of DBM
• The key concepts are :

• Architecture:

1. Visible Layer: This


layer represents the
observed data.

2. Hidden Layers: These


are multiple layers of
hidden units (neurons)
that interact with each
other and with the visible
Architecture of DBM
• Energy-Based Model:

• DBMs define a joint probability distribution over


visible and hidden variables using an energy function.

• The energy of a state (combination of visible and


hidden units) determines its probability.

• The energy function for a DBM with two hidden layers


can be written as:

• E(v,h1,h2)=−i∑​vi​bi​−j∑​hj1​bj1​−k∑​hk2​bk2​−ij∑​vi​Wij​hj1​
−jk∑​hj1​Wjk′​hk2​
Architecture of DBM
• Where v is the visible layer, h1 and h2 are the hidden
layers, b, b1, and b2 are biases, and W and W′ are
the weights connecting the layers.

• Training:

• Greedy Layer-Wise Training: Initially, each layer is


trained as an RBM, layer by layer. This simplifies the
learning process.

• Fine-Tuning: After pre-training, the entire network is


fine-tuned using algorithms like Stochastic Gradient
Descent (SGD) to adjust weights and minimize the
Architecture of DBM
• Contrastive Divergence: Often used in training, it
approximates the gradients needed to update the
weights.

• Advantages:

• Representation Power: With multiple layers, DBMs


can learn deep hierarchical representations of data,
capturing complex dependencies.

• Generative Capability: DBMs can generate new


samples from the learned distribution, making them
useful for tasks like data generation and reconstruction.
Architecture of DBM
• Feature Learning: They can learn useful features
for tasks such as classification, making them
versatile in various applications.

You might also like