0% found this document useful (0 votes)

29 views21 pages

Deep Learning - 2

The document discusses the challenges of long-term dependencies in Recurrent Neural Networks (RNNs) and introduces solutions such as Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU). It also presents the DL4J framework, which is designed for deep learning in Java and Scala, detailing its components and tools for building and training neural networks. Additionally, it covers the architecture of Convolutional Neural Networks (CNNs) for image classification tasks, specifically for recognizing handwritten digits using the MNIST dataset.

Uploaded by

swyf0hbh61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views21 pages

Deep Learning - 2

Uploaded by

swyf0hbh61

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

DEEP LEARNING

1. problem of longterm dependencies in RNNs.

Introduction to RNNs:
Recurrent Neural Networks (RNNs) are a type of neural network specifically designed for
handling sequential data, such as text, speech, or time series. Unlike feedforward
networks, RNNs have a feedback loop that allows information to persist across time steps
through a hidden state.

What are Long-Term Dependencies?

Long-term dependencies refer to scenarios where the output at a certain time step
depends on information from many steps earlier in the sequence.

Example:
Consider the sentence:
"The car that was parked outside the house yesterday was blue."
To correctly interpret or generate the word "blue," the model must remember the subject
"car" introduced much earlier. This requires remembering context from multiple steps
back — a long-term dependency.
Why Do RNNs Struggle with Long-Term Dependencies?
The difficulty arises during training through Backpropagation Through Time (BPTT).
1. Vanishing Gradient Problem:
• As the gradient is backpropagated through many time steps, it is multiplied repeatedly by
the weights and the derivatives of activation functions (like tanh or sigmoid).
• If these values are less than 1, the gradient shrinks exponentially.
• Eventually, the gradient becomes so small that early layers stop learning — the model
"forgets" earlier information.
2. Exploding Gradient Problem:
• If the weights are large, the gradient can grow exponentially, leading to unstable weight
updates and failure to converge.

Consequences:
• RNNs become unable to learn relationships between distant words or time steps.
• In tasks like language translation, speech recognition, or long document classification,
this leads to:
o Loss of context
o Incorrect outputs
o Poor overall performance

Solutions to the Problem:

1. LSTM (Long Short-Term Memory):
• LSTM introduces a memory cell and three gates:
o Input Gate: decides what information to store.
o Forget Gate: decides what to discard.
o Output Gate: decides what to output.
• This architecture allows important information to flow unchanged for long periods,
avoiding the vanishing gradient problem.
2. GRU (Gated Recurrent Unit):
• A simpler variant of LSTM with fewer gates (update and reset).
• Performs similarly in many tasks and is computationally more efficient.
3. Transformer Models:
• Use self-attention mechanisms to directly connect every input to every other input,
regardless of distance.
• Do not rely on recurrence, thus completely avoiding the issues faced by RNNs.
Detailed Example (with and without Long-Term Memory):
Let’s compare how a standard RNN and an LSTM handle a long sentence:
"After John went to the restaurant, he ordered a pizza."
• Task: Identify what "he" refers to.
Using a Standard RNN:
• The word "John" occurred many steps before "he."
• Due to vanishing gradients, the RNN may fail to retain "John" in memory.
• Result: the model might guess incorrectly who "he" is.
Using an LSTM:
• The model retains the memory of "John" using its cell state.
• When "he" appears, the model can correctly associate it with "John."
• Result: accurate understanding and output.

Conclusion:
RNNs are powerful for sequential tasks but suffer from long-term dependency problems
due to vanishing and exploding gradients. This makes them ineffective at remembering
inputs over long sequences. Advanced architectures like LSTM, GRU, and especially
Transformers have been developed to address these issues and enable learning over
longer time spans.

2. gated architecture in RNNs-Gated Recurrent Unit Networks | GeeksforGeeks

3. Long Short Term Memory (LSTM) network in RNN-What is LSTM – Long Short Term
Memory? | GeeksforGeeks
4. DL4J suite of tools

Introduction to DL4J (Deeplearning4j):

DL4J (Deeplearning4j) is an open-source, distributed deep learning library written for the
Java Virtual Machine (JVM). It is specifically designed for the Java and Scala ecosystems
and is suitable for production environments.
DL4J supports building and training neural networks and deep learning models such as:
• Feedforward Networks
• Convolutional Neural Networks (CNNs)
• Recurrent Neural Networks (RNNs)
• Autoencoders, GANs, Word2Vec, and more

Key Features of DL4J:

• Written in Java and Scala
• Supports distributed training via Apache Spark
• Integrates with Hadoop, Kubernetes, and Java-based enterprise applications
• Provides GPU support with CUDA
• Compatible with Keras model import
• Offers visualization tools and ND4J (n-dimensional arrays) as its numerical computing
foundation

DL4J Suite of Tools:

DL4J is not just a single library — it's a suite of integrated tools designed to handle every
stage of the deep learning pipeline. Here's a breakdown:

1. ND4J (N-Dimensional Arrays for Java)

• Equivalent to NumPy in Python
• Handles numerical operations like matrix multiplication, broadcasting, slicing
• Forms the computational backbone of DL4J
• Supports both CPU and GPU computation
Example: Used for handling input tensors, activations, and gradients during forward
and backward passes

2. DL4J Core
• The central library for building deep neural networks
• Supports standard layers, loss functions, optimizers, and training workflows
• Offers model configuration via JSON or Java API
Example: Used to create and train a CNN for image classification

3. DataVec
• A library for data transformation and preprocessing
• Converts raw data (CSV, images, audio, text) into structured format suitable for training
• Includes tools for normalization, tokenization, vectorization
Example: Converts CSV files into numerical features for regression or classification
tasks

4. Arbiter
• Tool for hyperparameter optimization
• Supports grid search, random search, and genetic algorithms
• Visualizes parameter tuning results
Example: Automatically finds the best learning rate and number of layers for a neural
network

5. RL4J (Reinforcement Learning for Java)

• A reinforcement learning library within DL4J
• Supports DQN, A3C, and other RL algorithms
• Integrates with OpenAI Gym via Java wrappers
Example: Training a bot to play a game or make stock trading decisions

6. SameDiff
• A symbolic automatic differentiation library, like TensorFlow's graph mode
• Allows defining dynamic and static computation graphs
• Useful for building custom operations and gradients
Example: Build and optimize custom loss functions or advanced neural architectures

7. Deeplearning4j UI
• Interactive visualization dashboard to monitor:
o Training accuracy
o Loss
o Weights
o Activations
• Helpful for debugging and understanding model behavior
Example: Monitor CNN layer outputs while training on image data

Real-World Example: Predicting Loan Approvals

Imagine you're building a loan approval system using DL4J:
1. DataVec: Preprocess customer data from a CSV (normalize age, income, encode
categorical features like job type)
2. ND4J: Store the processed feature matrix
3. DL4J Core: Build a deep feedforward neural network with input layer, hidden layers, and
output layer for classification (approve/deny)
4. Arbiter: Automatically search for the best configuration (e.g., best learning rate and
number of neurons)
5. SameDiff: Add a custom penalty to discourage overfitting
6. Deeplearning4j UI: Monitor model accuracy and loss over time
7. (Optional) RL4J: Implement reinforcement learning for real-time credit scoring
adjustments
Conclusion:
DL4J is a powerful deep learning framework for the JVM that comes with a comprehensive
suite of tools like ND4J, DataVec, Arbiter, and RL4J. These tools cover every part of the
deep learning workflow — from data preprocessing to model training, optimization, and
visualization.
Its seamless integration with enterprise systems, support for GPU and distributed training,
and Java compatibility make it an excellent choice for real-world, production-grade AI
applications.

---
5. concepts of the DL4J API
Introduction to DL4J API:
The DL4J (Deeplearning4j) API is designed for building and training deep learning models
in Java and Scala environments. It is modular, flexible, and built to integrate seamlessly
with enterprise Java ecosystems. The API provides tools for defining models, configuring
networks, processing data, and training neural networks.

Core Concepts of the DL4J API:

1. MultiLayerConfiguration / ComputationGraphConfiguration
These classes define the structure and architecture of a neural network.
• MultiLayerConfiguration: Used for sequential models (like standard feedforward
networks, CNNs, RNNs).
• ComputationGraphConfiguration: Used for complex models with multiple inputs,
outputs, or non-linear architectures (like encoder-decoder, Siamese networks).
Example:
java
.
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.list()
.layer(new DenseLayer.Builder().nIn(100).nOut(50).build())
.layer(new
OutputLayer.Builder().nIn(50).nOut(2).activation(Activation.SOFTMAX).build())
.build();

2. NeuralNetConfiguration.Builder
Used to define settings for the model such as:
• Learning rate
• Optimizer (e.g., Adam, SGD, Nesterovs)
• Weight initialization (Xavier, HeNormal, etc.)
• Activation functions (ReLU, tanh, sigmoid)
Purpose: Centralized configuration for layers, regularization, updaters.

3. Layers and Layer Types

DL4J provides various predefined layers:
• DenseLayer: Fully connected layer
• ConvolutionLayer: For image data and spatial input
• SubsamplingLayer: Pooling layers
• LSTM, GravesLSTM: For sequence modeling
• OutputLayer: Final layer with loss function (e.g., softmax, MSE)
Concept: Each layer is defined with its input/output size, activation function, and
weight initializer.

4. INDArray (ND4J)
The core data structure in DL4J, similar to NumPy's ndarray.
• Used to store inputs, outputs, weights, gradients
• Supports CPU and GPU backends
• Includes operations for slicing, reshaping, broadcasting, etc.
Example:
java
.
INDArray input = Nd4j.create(new float[]{1, 2, 3, 4}, new int[]{2, 2});

5. DataSet and DataSetIterator

• DataSet: Combines features (inputs) and labels (targets)
• DataSetIterator: Interface for batch-wise data feeding during training
DL4J provides several iterators:
• RecordReaderDataSetIterator: For CSV, text, or image data
• ListDataSetIterator: For custom lists of datasets
Purpose: Efficiently load and feed data to the model in training loops

6. Model Training and Evaluation

Once the model is defined:
• .fit() method is used for training
• .evaluate() method calculates metrics like accuracy, precision, recall
• Early stopping and model listeners can be configured
Example:
java
.
model.fit(dataIterator);
Evaluation eval = model.evaluate(testIterator);
System.out.println(eval.stats());

7. Transfer Learning API

DL4J supports transfer learning, allowing you to load pretrained models (e.g., VGG16) and
modify them.
• You can "freeze" layers and add new ones
• Useful for tasks like image classification, NLP with fewer training resources

8. Listeners and UI Monitoring

• ScoreIterationListener: Logs score after each iteration
• StatsListener: Provides metrics to DL4J UI dashboard
• DL4J UI shows real-time training graphs and activations

9. Keras Model Import

DL4J can import models created in Keras/TensorFlow using .h5 files.
Benefit: Leverages existing Python-based model development and runs it in Java
environments.

Summary Table:
Concept Purpose

MultiLayerConfiguration Defines model structure

NeuralNetConfiguration Sets global training options

INDArray Stores data and weights

DataSet / DataSetIterator Loads and feeds training data

Layers (Dense, CNN, LSTM) Defines computations per network layer

Concept Purpose

Training Methods .fit(), .evaluate(), listeners

Transfer Learning API Reuse existing models

DL4J UI & Visualization Monitor training in real time

Keras Model Import Cross-compatibility with Python tools

Conclusion:
The DL4J API is a full-featured deep learning framework for JVM users. Its well-structured
components — from model configuration and training to data pipelines and evaluation —
make it powerful for both research and production. Understanding these core concepts
enables developers to build sophisticated neural networks with high control and
performance.

6.Architecture of a Convolutional Neural Network (CNN) used for image classification

tasks-Introduction to Convolution Neural Network | GeeksforGeeks.

7. Explain the implementation of a CNN model in DL4J for recognizing handwritten digits
(e.g., MNIST dataset).include the roles of convolution, pooling and dense layers.
Introduction to CNN for Handwritten Digit Recognition
A Convolutional Neural Network (CNN) is a deep learning architecture specifically
designed to process image data. It is particularly effective for recognizing handwritten
digits in datasets like MNIST, which contains 28x28 grayscale images of digits (0–9).
DL4J (Deeplearning4j) provides support for implementing CNNs using Java. This model
uses layers such as convolution, pooling, and dense (fully connected) layers to learn spatial
features and classify images.

Roles of CNN Components

1. Convolution Layer:
• Applies filters (kernels) to the input image to detect patterns like edges, curves, etc.
• Each filter extracts a feature map that activates when a specific pattern is found.
• Learns low-level features in early layers and high-level features in deeper layers.
Example in DL4J:
java
CopyEdit
.layer(new ConvolutionLayer.Builder(5, 5) // kernel size
.nIn(1) // input depth (1 for grayscale)
.nOut(20) // number of filters
.stride(1, 1)
.activation(Activation.RELU)
.build())

2. Subsampling (Pooling) Layer:

• Reduces the spatial size of the feature maps.
• Commonly uses max pooling to keep the most important features.
• Helps in reducing computation, controlling overfitting, and making the model translation-
invariant.
Example in DL4J:
java
CopyEdit
.layer(new SubsamplingLayer.Builder(PoolingType.MAX)
.kernelSize(2, 2)
.stride(2, 2)
.build())

3. Dense (Fully Connected) Layer:

• Takes the flattened feature maps and performs classification.
• Connects every neuron from the previous layer to the next layer.
• The final layer is typically a softmax layer for multi-class classification.
Example in DL4J:
java
CopyEdit
.layer(new DenseLayer.Builder()
.nOut(100)
.activation(Activation.RELU)
.build())

.layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nOut(10) // for digits 0-9
.activation(Activation.SOFTMAX)
.build())
Implementation Workflow in DL4J
Step 1: Load MNIST Data
java
CopyEdit
DataSetIterator mnistTrain = new MnistDataSetIterator(64, true, 123);
DataSetIterator mnistTest = new MnistDataSetIterator(64, false, 123);

Step 2: Define CNN Model

java
CopyEdit
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(123)
.updater(new Adam(0.001))
.weightInit(WeightInit.XAVIER)
.list()
.layer(new ConvolutionLayer.Builder(5, 5)
.nIn(1).nOut(20).stride(1, 1)
.activation(Activation.RELU).build())
.layer(new SubsamplingLayer.Builder(PoolingType.MAX)
.kernelSize(2, 2).stride(2, 2).build())
.layer(new ConvolutionLayer.Builder(5, 5)
.nOut(50).stride(1, 1)
.activation(Activation.RELU).build())
.layer(new SubsamplingLayer.Builder(PoolingType.MAX)
.kernelSize(2, 2).stride(2, 2).build())
.layer(new DenseLayer.Builder().nOut(100)
.activation(Activation.RELU).build())
.layer(new
OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.nOut(10).activation(Activation.SOFTMAX).build())
.setInputType(InputType.convolutionalFlat(28, 28, 1)) // for MNIST
.build();

MultiLayerNetwork model = new MultiLayerNetwork(conf);

model.init();

Step 3: Train the Model

java
CopyEdit
model.fit(mnistTrain, 10); // Train for 10 epochs

Step 4: Evaluate the Model

java
CopyEdit
Evaluation eval = model.evaluate(mnistTest);
System.out.println(eval.stats());

Result:
After training, the CNN achieves high accuracy (~98–99%) on MNIST due to the
effectiveness of convolution in feature extraction and pooling in dimensionality
reduction.

Conclusion:
DL4J provides a clean and powerful API to implement CNNs for image recognition tasks.
Using convolution layers to detect patterns, pooling layers to reduce complexity, and
dense layers to classify, a CNN in DL4J can accurately recognize handwritten digits from
the MNIST dataset.

8. Explain the architecture and working of an Auto encoder. What is the objective function
used in training an auto encoder?

Introduction:
An Autoencoder is a type of unsupervised neural network used primarily for:
• Dimensionality reduction
• Feature extraction
• Data denoising
• Anomaly detection
It learns to reconstruct its input, forcing the model to learn important features by
compressing data through a bottleneck structure.

Architecture of an Autoencoder:
An Autoencoder consists of three main parts:
1. Encoder:
• Maps the input data xxx to a latent representation hhh (also called the code or
embedding).
• Reduces the input dimension.
h=f(Wex+be)h = f(W_e x + b_e)h=f(Wex+be)
Where:
• WeW_eWe: Weights of the encoder
• beb_ebe: Bias
• fff: Activation function (e.g., ReLU, sigmoid)
2. Latent Space (Code):
• A compressed, dense representation of the original input.
• Captures the most informative features in fewer dimensions.
3. Decoder:
• Reconstructs the input from the latent representation.
x′=g(Wdh+bd)x' = g(W_d h + b_d)x′=g(Wdh+bd)
Where:
• WdW_dWd: Weights of the decoder
• bdb_dbd: Bias
• ggg: Activation function

Working of an Autoencoder:
1. Input Layer: Receives raw input data (e.g., image, text vector).
2. Encoder Network: Compresses the input to a lower dimension.
3. Code (Bottleneck): Intermediate compact representation.
4. Decoder Network: Reconstructs the input from the compressed code.
5. Output Layer: Should be as close as possible to the input.

Objective Function (Loss Function):

The objective of an autoencoder is to minimize the reconstruction error, i.e., how
different the output is from the input.
Common Loss Functions:
a) Mean Squared Error (MSE):
L(x,x′)=∥x−x′∥2=∑i=1n(xi−xi′)2\mathcal{L}(x, x') = \|x - x'\|^2 = \sum_{i=1}^{n}(x_i -
x'_i)^2L(x,x′)=∥x−x′∥2=i=1∑n(xi−xi′)2
Used when input data is continuous (e.g., images, sensor data).
b) Binary Cross-Entropy:
L(x,x′)=−∑i=1n[xilog⁡(xi′)+(1−xi)log⁡(1−xi′)]\mathcal{L}(x, x') = -\sum_{i=1}^{n}[x_i
\log(x'_i) + (1 - x_i) \log(1 - x'_i)]L(x,x′)=−i=1∑n[xilog(xi′)+(1−xi)log(1−xi′)]
Used for binary or normalized inputs between 0 and 1.

Types of Autoencoders:
• Vanilla Autoencoder: Basic encoder-decoder structure
• Sparse Autoencoder: Adds sparsity constraint on code
• Denoising Autoencoder: Learns to reconstruct input from noisy versions
• Variational Autoencoder (VAE): Learns probabilistic latent representations
• Convolutional Autoencoder: Uses convolution layers for image data

Use Case Example:

Image Compression:
• Input: 28×28 grayscale image (784 pixels)
• Encoder compresses to 32 neurons (latent space)
• Decoder reconstructs image back to 784 neurons
• Trained to minimize MSE between input and output

Conclusion:
An Autoencoder is a powerful unsupervised learning model that learns to reconstruct its
inputs using an encoder-decoder architecture. The key idea is to force the network to
learn a compressed representation. The objective function, typically Mean Squared Error
or Cross-Entropy, ensures the output is as close as possible to the original input, enabling
useful applications in compression, noise reduction, and anomaly detection

9. Explain the concepts of Stochastic Encoders and Decoders. How are they used in
generative models?

A stochastic encoder is responsible for mapping input data (e.g., images, text, etc.) into
a probabilistic latent space rather than a deterministic one. This means that instead of
producing a fixed latent code for a given input, the encoder outputs a distribution over
the latent space. This distribution is typically parameterized by a mean and variance (for
Gaussian distributions, for example).
• In a VAE, for example, the encoder outputs a mean vector and a log-variance vector for
each data point. These parameters define a Gaussian distribution from which we can
sample latent variables (e.g., z). The randomness in this latent variable ensures that the
model captures the variability inherent in the data and introduces diversity in the
generated samples.
• The use of a stochastic encoder allows the model to learn a probabilistic representation
of the data. This is crucial in generative models because it enables the generation of
new, diverse samples that were not seen during training.
• Example: Given an image of a cat, the encoder could output a distribution over the
latent space, where the latent variables correspond to different "features" or
characteristics of the cat (e.g., color, shape, size). The encoder does not output a single
point but a spread of possible values representing various possible variations of the cat.
2. Stochastic Decoder:
A stochastic decoder takes a sample from the latent space and generates new data. In
contrast to a deterministic decoder, which would always produce the same output for a
given latent vector, a stochastic decoder introduces randomness into the generation
process. This randomness allows the decoder to produce diverse outputs from the same
latent code, depending on the parameters learned during training.
• In VAEs, the decoder takes the latent vector sampled from the probabilistic distribution
(produced by the encoder) and tries to reconstruct the original data point, but with a
probabilistic output. For instance, if the input data is an image, the decoder might
output pixel values or even probabilities for each pixel (rather than a single deterministic
pixel value), thereby introducing randomness in the generated image.
• Example: After sampling a latent vector from the probabilistic distribution, the decoder
generates a new image of a cat. Depending on the randomness introduced by the
stochastic nature of the decoder, the generated image could vary in color, background,
or other features, even though the latent vector might come from the same region of
the latent space.
3. Role in Generative Models:
The combination of stochastic encoders and decoders is especially useful in generative
models because it allows these models to learn rich, complex distributions of data and
generate diverse, new samples that are realistic and varied.
• Variational Autoencoders (VAEs): In VAEs, the encoder and decoder are both stochastic.
The encoder learns to approximate a posterior distribution over the latent space, and
the decoder learns to generate new data points from this distribution. The stochastic
nature ensures that the model can generate novel and diverse samples by sampling from
the learned distribution.
• Generative Adversarial Networks (GANs): GANs do not explicitly have a stochastic
encoder and decoder in the same way as VAEs. However, the generator in a GAN can be
considered as a form of stochastic decoder, since it generates new samples by drawing
random noise (a latent variable) from a distribution and transforming it into data.
• Normalizing Flows: These models can also incorporate stochastic encoders and
decoders by modeling complex distributions with invertible transformations, allowing for
sampling from highly complex latent distributions.
4. Why Stochasticity Matters:
The introduction of stochasticity (randomness) into the encoding and decoding process
is important because it enables the model to:
• Generalize better: By learning a distribution over the latent space, the model can
represent multiple possible interpretations of the input data.
• Generate diverse outputs: Random sampling from the learned distribution allows the
model to generate new, varied samples that resemble the training data but are not
identical to any specific training example.
• Model uncertainty: The stochastic nature captures the inherent uncertainty in the data
and allows the model to learn a richer representation of the underlying data
distribution.
In summary, stochastic encoders and decoders enable generative models to model
complex data distributions and generate new data points by introducing variability and
randomness into the process. This is fundamental for tasks like image generation, text
generation, and other forms of creative or probabilistic data synthesis.

10. Explain the architecture and working of a Generative Adversarial Network (GAN). How
do the Generator and Discriminator interact?- Generative Adversarial Network (GAN) |
GeeksforGeeks
11. Derive the objective function of a GAN. Explain how Stochastic Gradient Descent is used
to train both generator and discriminator.

In Generative Adversarial Networks (GANs), the goal is to train two networks—the

generator (G) and the discriminator (D)—in a game-theoretic setting. The generator
creates synthetic data, and the discriminator tries to distinguish between real data (from
the true distribution) and fake data (produced by the generator). The ultimate objective
of training a GAN is for the generator to produce data that is indistinguishable from real
data, while the discriminator becomes better at identifying fake data.
1. Objective Function of a GAN:
The objective function of a GAN can be understood in terms of a min-max game
between the generator and discriminator:
• The discriminator tries to maximize the probability of correctly classifying real and fake
data.
• The generator tries to minimize the probability that the discriminator can distinguish
real data from fake data.
Let’s formalize this mathematically.
The Min-Max Game:
1. Discriminator’s Objective: The discriminator D(x)D(x)D(x) takes in a data point xxx and
outputs a probability that xxx is real (i.e., drawn from the true distribution
pdata(x)p_{\text{data}}(x)pdata(x)):
o D(x)=P(real∣x)D(x) = P(\text{real} | x)D(x)=P(real∣x), where xxx can either be real
data or fake data generated by GGG.
The discriminator's goal is to maximize the likelihood of correctly classifying real and
fake data. For a given real data point xxx, it wants to output a probability close to 1, and
for a generated (fake) data point G(z)G(z)G(z), it wants to output a probability close to 0.
2. Generator’s Objective: The generator G(z)G(z)G(z) produces synthetic data G(z)G(z)G(z)
from a random latent vector zzz. The generator’s goal is to fool the discriminator into
classifying fake data as real. Therefore, the generator aims to minimize the probability
that the discriminator classifies generated data as fake.
The objective function can be written as:
min⁡Gmax⁡DV(D,G)=Ex∼pdata(x)[log⁡D(x)]+Ez∼pz(z)[log⁡(1−D(G(z)))]\min_G \max_D
V(D, G) = \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)] + \mathbb{E}_{z \sim
p_z(z)}[\log(1 - D(G(z)))]GminDmaxV(D,G)=Ex∼pdata(x)[logD(x)]+Ez∼pz(z)
[log(1−D(G(z)))]
Where:
• Ex∼pdata(x)[log⁡D(x)]\mathbb{E}_{x \sim p_{\text{data}}(x)}[\log D(x)]Ex∼pdata(x)
[logD(x)] represents the discriminator's loss when classifying real data.
• Ez∼pz(z)[log⁡(1−D(G(z)))]\mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))]Ez∼pz(z)
[log(1−D(G(z)))] represents the discriminator's loss when classifying fake data generated
by the generator.
• D(x)D(x)D(x) is the output of the discriminator for real data.
• D(G(z))D(G(z))D(G(z)) is the output of the discriminator for generated data.
2. Training the GAN:
The objective function is a min-max problem where the generator and discriminator
have opposite goals:
• The discriminator tries to maximize V(D,G)V(D, G)V(D,G), improving its ability to
distinguish real from fake data.
• The generator tries to minimize V(D,G)V(D, G)V(D,G), improving its ability to generate
realistic data that fools the discriminator.
Steps for training using Stochastic Gradient Descent (SGD):
1. Training the Discriminator: The discriminator DDD is trained to maximize the objective
function with respect to DDD by updating its parameters using the gradient of the loss
with respect to DDD. This is typically done using Stochastic Gradient Descent (SGD) or
its variants (like Adam).
o The discriminator updates its weights by calculating the gradients of the
following terms:
▪ Ex∼pdata(x)[log⁡D(x)]\mathbb{E}_{x \sim p_{\text{data}}(x)}[\log
D(x)]Ex∼pdata(x)[logD(x)]: The discriminator’s loss when classifying real
data.
▪ Ez∼pz(z)[log⁡(1−D(G(z)))]\mathbb{E}_{z \sim p_z(z)}[\log(1 -
D(G(z)))]Ez∼pz(z)[log(1−D(G(z)))]: The discriminator’s loss when
classifying generated data.
The update rule for the discriminator DDD is:
∇θDLD=∇θD[−Ex∼pdata(x)[log⁡D(x)]−Ez∼pz(z)[log⁡(1−D(G(z)))]]\nabla_{\theta_D}
\mathcal{L}_D = \nabla_{\theta_D} \left[ - \mathbb{E}_{x \sim p_{\text{data}}(x)}[\log
D(x)] - \mathbb{E}_{z \sim p_z(z)}[\log(1 - D(G(z)))] \right]∇θDLD=∇θD[−Ex∼pdata(x)
[logD(x)]−Ez∼pz(z)[log(1−D(G(z)))]]
o The discriminator is trained using real data (with label 111) and fake data (with
label 000) generated by the current state of the generator.
o After updating the discriminator’s weights, the generator’s weights remain fixed
for this step.
2. Training the Generator: The generator GGG is trained to minimize the objective function
with respect to GGG by updating its parameters to maximize the discriminator's error in
distinguishing fake data. Specifically, the generator wants to maximize log⁡D(G(z))\log
D(G(z))logD(G(z)), which encourages the discriminator to misclassify fake data as real.
The generator’s update rule is:
∇θGLG=∇θG[−Ez∼pz(z)[log⁡D(G(z))]]\nabla_{\theta_G} \mathcal{L}_G =
\nabla_{\theta_G} \left[ - \mathbb{E}_{z \sim p_z(z)}[\log D(G(z))] \right]∇θGLG=∇θG
[−Ez∼pz(z)[logD(G(z))]]
o This is equivalent to training the generator to minimize log⁡(1−D(G(z)))\log(1 -
D(G(z)))log(1−D(G(z))), as the generator wants D(G(z))D(G(z))D(G(z)) to be close
to 1 (i.e., the fake data should be classified as real).
o After updating the generator’s weights, the discriminator’s weights remain fixed
for this step.
3. Stochastic Gradient Descent (SGD) in GANs:
• Stochastic Gradient Descent (SGD) is used to update both the discriminator’s and
generator’s weights based on the gradients of their respective loss functions.
• Mini-batch training is typically used, where a small batch of real data and a batch of fake
data are sampled to calculate the gradients.
• Adam Optimizer (a variant of SGD) is often used for better convergence in GAN training.
4. Training Procedure:
Here’s a simplified overview of the training procedure:
1. Sample a mini-batch of real data points {x1,x2,…,xm}\{x_1, x_2, \dots, x_m\}{x1,x2,…,xm
} from the real data distribution.
2. Sample a mini-batch of latent vectors {z1,z2,…,zm}\{z_1, z_2, \dots, z_m\}{z1,z2,…,zm}
from a fixed prior distribution pz(z)p_z(z)pz(z).
3. Generate fake data {G(z1),G(z2),…,G(zm)}\{G(z_1), G(z_2), \dots, G(z_m)\}{G(z1),G(z2
),…,G(zm)} using the current generator.
4. Update the discriminator by minimizing the loss based on real and fake data (via
gradient descent).
5. Update the generator by minimizing the loss based on the discriminator’s feedback (via
gradient descent).
Repeat steps 1–5 for a fixed number of iterations or until the generator produces high-
quality data that is indistinguishable from real data.
Conclusion:
• The objective function of GANs defines a min-max game between the generator and
the discriminator, where the generator tries to fool the discriminator, and the
discriminator tries to correctly distinguish real from fake data.
• Stochastic Gradient Descent (SGD) is used to optimize both the generator and
discriminator. The discriminator is updated to maximize its ability to differentiate real
from fake data, while the generator is updated to minimize the discriminator’s ability to
classify generated data as fake.

19 Deep Learning
100% (1)
19 Deep Learning
49 pages
Deep Learning Question Bank
No ratings yet
Deep Learning Question Bank
1 page
DeepLearning SecC
No ratings yet
DeepLearning SecC
20 pages
CNN RNN LSTM Attention
No ratings yet
CNN RNN LSTM Attention
86 pages
Unit 4 (B) NGP
No ratings yet
Unit 4 (B) NGP
127 pages
Deep Learning (MODULE-5)
100% (1)
Deep Learning (MODULE-5)
71 pages
Lecture5 MCQ Guide
No ratings yet
Lecture5 MCQ Guide
9 pages
15.03.2024 Csa3007 A24+d23+d24
No ratings yet
15.03.2024 Csa3007 A24+d23+d24
8 pages
Deep Learning RNN
100% (2)
Deep Learning RNN
53 pages
RNNs: A Guide for AI Enthusiasts
No ratings yet
RNNs: A Guide for AI Enthusiasts
83 pages
07 RNN Recurrent Neural Networks
No ratings yet
07 RNN Recurrent Neural Networks
115 pages
RNNs: Applications and Training Guide
No ratings yet
RNNs: Applications and Training Guide
36 pages
CH4 - AA1.1-Sequence Models
No ratings yet
CH4 - AA1.1-Sequence Models
26 pages
DL Unit 5 Perfect Pdf. - 1
No ratings yet
DL Unit 5 Perfect Pdf. - 1
17 pages
Optimization For Long-Term Dependencies
No ratings yet
Optimization For Long-Term Dependencies
57 pages
Lecture 11
No ratings yet
Lecture 11
57 pages
LSTM Ucl
100% (1)
LSTM Ucl
35 pages
DL - Intro
No ratings yet
DL - Intro
35 pages
DL4J Deep Learning Guide
No ratings yet
DL4J Deep Learning Guide
26 pages
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
No ratings yet
28-Recurrent Neural Networks - Bidirectional RNNs-19!09!2024
12 pages
RNN LSTM
No ratings yet
RNN LSTM
72 pages
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
No ratings yet
RNNs and Their Types - 15 Slides (Easy Copy-Paste Format)
6 pages
Unit Iii
No ratings yet
Unit Iii
5 pages
Unit 4
No ratings yet
Unit 4
50 pages
31-Architectures, Deep Recurrent Networks, Auto Encoders-26!09!2024
No ratings yet
31-Architectures, Deep Recurrent Networks, Auto Encoders-26!09!2024
34 pages
Unit 4
No ratings yet
Unit 4
27 pages
Module 06
No ratings yet
Module 06
5 pages
RNNs & LSTMs for Tech Enthusiasts
No ratings yet
RNNs & LSTMs for Tech Enthusiasts
9 pages
CSE465 T7b LSTM
No ratings yet
CSE465 T7b LSTM
23 pages
Deep Learning for Data Scientists
No ratings yet
Deep Learning for Data Scientists
21 pages
Deeplearning4j for Java Developers
No ratings yet
Deeplearning4j for Java Developers
5 pages
RNN 2
No ratings yet
RNN 2
144 pages
Lecture Notes - RRN
No ratings yet
Lecture Notes - RRN
8 pages
Concise Deep Learning Answers (For 5 Marks)
No ratings yet
Concise Deep Learning Answers (For 5 Marks)
7 pages
Ispr 23 24 GRN
No ratings yet
Ispr 23 24 GRN
51 pages
DLBench A Comprehensive Experimental Evaluation of
No ratings yet
DLBench A Comprehensive Experimental Evaluation of
23 pages
Deep Learning Framework
No ratings yet
Deep Learning Framework
19 pages
Unit IV
No ratings yet
Unit IV
22 pages
Chapter 2
No ratings yet
Chapter 2
68 pages
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
0% (1)
Unit 4 - Machine Learning - WWW - Rgpvnotes.in
16 pages
RNNs
No ratings yet
RNNs
22 pages
RNNs and LSTMs: Concepts and Applications
No ratings yet
RNNs and LSTMs: Concepts and Applications
23 pages
Convolutional Neural Networks (CNNS)
No ratings yet
Convolutional Neural Networks (CNNS)
10 pages
Deep Learning Concise Notes
No ratings yet
Deep Learning Concise Notes
4 pages
Deeplearning 4 J
No ratings yet
Deeplearning 4 J
5 pages
Extensive Literature Ondeep Learning
No ratings yet
Extensive Literature Ondeep Learning
52 pages
Deep Learning Frameworks & Techniques
No ratings yet
Deep Learning Frameworks & Techniques
5 pages
Deep Learning Insights for Students
No ratings yet
Deep Learning Insights for Students
18 pages
Recurrent Neural Network: What Does RNN Stand For?
No ratings yet
Recurrent Neural Network: What Does RNN Stand For?
7 pages
RNN
No ratings yet
RNN
28 pages
Cs224n 2025 Lecture06 Fancy RNN
No ratings yet
Cs224n 2025 Lecture06 Fancy RNN
57 pages
03 AIbDS II LSTM
No ratings yet
03 AIbDS II LSTM
34 pages
DLT Unit-4
No ratings yet
DLT Unit-4
18 pages
CH 4 Deep Learning
No ratings yet
CH 4 Deep Learning
7 pages
DS303 RNN LSTM
No ratings yet
DS303 RNN LSTM
16 pages
Unit 4 - MachineLearning
No ratings yet
Unit 4 - MachineLearning
16 pages
Deep Learning Essentials for Experts
No ratings yet
Deep Learning Essentials for Experts
8 pages
Deep Learning Subject Practicals Uni Mumbai
No ratings yet
Deep Learning Subject Practicals Uni Mumbai
13 pages
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
No ratings yet
R21 - A7709 - Deep Learning: Dr. Bhawani Sankar Panigrahi
92 pages
Deep Learning
No ratings yet
Deep Learning
18 pages
Unit 1
No ratings yet
Unit 1
99 pages
Dis Cia 1
No ratings yet
Dis Cia 1
25 pages
I ST Internal-CE
No ratings yet
I ST Internal-CE
26 pages
Ce 2internal
No ratings yet
Ce 2internal
34 pages
Ce 2
No ratings yet
Ce 2
28 pages
Linear Inequalities
100% (1)
Linear Inequalities
7 pages
Ohms Law 14to16 Lesson-Plan
No ratings yet
Ohms Law 14to16 Lesson-Plan
3 pages
Photogrammetry Manual PDF
No ratings yet
Photogrammetry Manual PDF
103 pages
Estmt - 2024 07 17
No ratings yet
Estmt - 2024 07 17
6 pages
Format Messtechnik GMBH
No ratings yet
Format Messtechnik GMBH
44 pages
An Introduction To Single Screw Extrusion
No ratings yet
An Introduction To Single Screw Extrusion
6 pages
General Engineering PDF
No ratings yet
General Engineering PDF
12 pages
Canara Engineering College: Benjanapadavu - 574219
No ratings yet
Canara Engineering College: Benjanapadavu - 574219
64 pages
AICh EWeir Loading SPR 2009
No ratings yet
AICh EWeir Loading SPR 2009
13 pages
Packing Machine Operation Instruction
No ratings yet
Packing Machine Operation Instruction
18 pages
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
No ratings yet
Xi-Maths Model Paper 2025 (According To Reduced Syllabus) - The Anonymous Institute
6 pages
USX Corporation
0% (4)
USX Corporation
13 pages
Surge Protection Standards Guide
No ratings yet
Surge Protection Standards Guide
1 page
Latihan Soal
100% (1)
Latihan Soal
3 pages
MT Test 1 QP
No ratings yet
MT Test 1 QP
2 pages
IIT Kharagpur M. Tech Cutoff 2008-09
100% (3)
IIT Kharagpur M. Tech Cutoff 2008-09
2 pages
BENLAC Syllabus 2021
No ratings yet
BENLAC Syllabus 2021
10 pages
DLL Matatag - English 4 q4 w2
No ratings yet
DLL Matatag - English 4 q4 w2
13 pages
Cat Red
No ratings yet
Cat Red
5 pages
1) Segmentación: Las Bases de Segmentación Utilizada Por Claro en Sus
No ratings yet
1) Segmentación: Las Bases de Segmentación Utilizada Por Claro en Sus
5 pages
American Options Pricing Methods
No ratings yet
American Options Pricing Methods
9 pages
Steel Products for Industry Use
No ratings yet
Steel Products for Industry Use
38 pages
Standard of Competence
No ratings yet
Standard of Competence
11 pages
Software Testing Life Cycle
100% (4)
Software Testing Life Cycle
3 pages
Overview of Timeline Panel
No ratings yet
Overview of Timeline Panel
15 pages
Mind, Language and Society Philosophy in The Real World
No ratings yet
Mind, Language and Society Philosophy in The Real World
189 pages
CS 601p Assignment 1 Solution BC210410285
No ratings yet
CS 601p Assignment 1 Solution BC210410285
4 pages
Roberts and Lamp - Geoeconomics Narrative
No ratings yet
Roberts and Lamp - Geoeconomics Narrative
21 pages
6630-Article Text-12424-1-10-20180412
No ratings yet
6630-Article Text-12424-1-10-20180412
13 pages
4pm1 02r Rms 20230302
No ratings yet
4pm1 02r Rms 20230302
29 pages