KEMBAR78
Chapter-2 (Deep Learning) | PDF | Artificial Neural Network | Deep Learning
0% found this document useful (0 votes)
56 views18 pages

Chapter-2 (Deep Learning)

Uploaded by

bhavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
56 views18 pages

Chapter-2 (Deep Learning)

Uploaded by

bhavani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Module-II Deep Learning

A Probabilistic Theory of Deep Learning

The history of deep learning is tied closely to the broader field of artificial intelligence (AI)
and machine learning (ML), which seeks to create machines that can learn from data. Deep
learning, specifically, refers to a subset of machine learning that uses neural networks with
many layers to model complex patterns in data. Here’s a breakdown of the key milestones in
the development of deep learning:

1940s–1960s: Early Neural Network Ideas

1. McCulloch-Pitts Neuron (1943): Warren McCulloch and Walter Pitts developed a


model of artificial neurons that could perform basic logical functions like AND, OR,
and NOT, inspired by the functioning of biological neurons.
2. Hebbian Learning (1949): Donald Hebb proposed a learning mechanism for
neurons, stating that "neurons that fire together, wire together." This principle
influenced the development of learning algorithms for neural networks.
3. Perceptron (1958): Frank Rosenblatt developed the perceptron, a simple model of a
single-layer neural network. It was capable of binary classification but was limited in
the types of functions it could compute, particularly non-linearly separable problems.
4. Limitations Highlighted (1969): Marvin Minsky and Seymour Papert's book
Perceptrons highlighted the limitations of single-layer neural networks, leading to a
decline in research on neural networks. They showed that simple neural networks
couldn't solve the XOR problem, which required more sophisticated techniques.

1980s: Backpropagation and Revival of Neural Networks

1. Backpropagation Algorithm (1986): David Rumelhart, Geoffrey Hinton, and


Ronald J. Williams popularized the backpropagation algorithm, which allowed for the
training of multi-layer neural networks (also called multi-layer perceptrons, or MLPs).
This was a significant breakthrough that allowed deep neural networks (with multiple
layers) to be trained effectively by adjusting the weights of neurons based on the error
of their predictions.
2. Neocognitron (1980): Kunihiko Fukushima developed the neocognitron, an early
model of a convolutional neural network (CNN). The neocognitron was inspired by
the visual cortex and laid the groundwork for modern CNNs used in image
recognition.

1990s–2000s: Slow Progress and New Insights

1. Support Vector Machines and Boosting: In the 1990s, simpler machine learning
methods like support vector machines (SVMs) and decision trees (boosting) became
more popular due to their success and computational efficiency compared to neural
networks, which were hard to train at the time.
2. Vanishing Gradient Problem: It was recognized that training deep networks was
difficult because of the vanishing gradient problem—where gradients used in
backpropagation became too small to update weights in earlier layers effectively.
Module-II Deep Learning
3. Recurrent Neural Networks (RNNs): In the 1990s, RNNs, particularly Long Short-
Term Memory (LSTM) networks introduced by Sepp Hochreiter and Jürgen
Schmidhuber (1997), addressed the difficulty of learning long-term dependencies in
sequential data.

2006–2010: Deep Learning Renaissance

1. Deep Belief Networks (2006): Geoffrey Hinton and his collaborators introduced deep
belief networks (DBNs), which sparked renewed interest in deep learning. DBNs used
a layer-by-layer pretraining method to address some of the issues with training deep
neural networks, particularly the vanishing gradient problem.
2. Better Hardware – GPUs: Around this time, graphics processing units (GPUs),
initially designed for fast image rendering, were repurposed for deep learning because
they could efficiently perform the large-scale matrix operations required by neural
networks.
3. Rectified Linear Unit (ReLU): The ReLU activation function, introduced in 2009,
helped mitigate the vanishing gradient problem and allowed deep networks to train
faster by preventing the gradients from shrinking as they propagated back through the
network.

2012–2015: Breakthroughs in Image and Speech Recognition

1. ImageNet (2012): Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton used a deep
convolutional neural network (AlexNet) to win the ImageNet competition,
significantly outperforming previous methods in image classification. This moment
marked a major public recognition of the power of deep learning.
2. Speech Recognition: Around the same time, deep learning methods began
outperforming traditional approaches in speech recognition, leading companies like
Google, Microsoft, and Baidu to adopt deep learning for their products.
3. Generative Models – GANs (2014): Ian Goodfellow introduced Generative
Adversarial Networks (GANs), a new class of generative models that could create
realistic data, such as images, by having two neural networks (a generator and a
discriminator) compete against each other.

2016–Present: Dominance of Deep Learning

1. AlphaGo (2016): DeepMind’s AlphaGo, based on deep reinforcement learning and


Monte Carlo tree search, defeated world champion Go player Lee Sedol. This
demonstrated the potential of deep learning in complex strategic tasks.
2. Transformers (2017): The introduction of the transformer architecture by Vaswani et
al. transformed natural language processing (NLP). Transformers, unlike RNNs,
allowed for more parallelization and became the foundation of state-of-the-art
language models like BERT (2018) and GPT (2018-present).
3. GPT and Large Language Models: The release of OpenAI’s GPT models, starting
with GPT-2 (2019) and GPT-3 (2020), demonstrated the power of deep learning in
generating human-like text and revolutionized NLP applications like chatbots, writing
assistance, and more.
4. DALL·E and CLIP (2021): OpenAI introduced models like DALL·E, capable of
generating images from text descriptions, and CLIP, which can understand images
and text together, pushing the boundaries of multimodal deep learning.
Module-II Deep Learning
Challenges and Future Directions

 Ethics and Bias: As deep learning becomes more pervasive, there are growing
concerns about the ethical implications, such as the biases embedded in the data these
models are trained on.
 Explainability: Deep learning models are often seen as "black boxes," and there's a
push towards creating more interpretable AI systems.
 Efficiency and Sustainability: Large deep learning models require massive amounts
of computational power and energy, raising concerns about their environmental
impact.

Deep learning continues to evolve, with innovations in areas such as self-supervised learning,
reinforcement learning, and neuromorphic computing pushing the boundaries of what these
models can achieve.

Bach propagation and regularization

Backpropagation

Backpropagation (short for "backward propagation of errors") is the algorithm used to train
deep neural networks. It adjusts the weights of the connections between neurons in the
network to minimize the error in the network's predictions, improving its performance over
time. Here's a step-by-step breakdown of how backpropagation works:

How Backpropagation Works:

Need for Multilayer Networks


 Single Layer networks cannot used to solve Linear Inseparable problems & can only
be used to solve linear separable problems
 Single layer networks cannot solve complex problems
 Single layer networks cannot be used when large input-output data set is available
 Single layer networks cannot capture the complex information’s available in the
training pairs
Hence to overcome the above said Limitations we use Multi-Layer Networks.
Multi-Layer Networks
 Any neural network which has at least one layer in between input and output layers is
called Multi-Layer Networks
 Layers present in between the input and out layers are called Hidden Layers
 Input layer neural unit just collects the inputs and forwards them to the next higher
layer
 Hidden layer and output layer neural units process the information’s feed to them and
produce an appropriate output
 Multi -layer networks provide optimal solution for arbitrary classification problems
 Multi -layer networks use linear discriminants, where the inputs are non linear
Module-II Deep Learning

Back Propagation Networks (BPN)


Introduced by Rumelhart, Hinton, & Williams in 1986. BPN is a Multi- layer Feedforward
Network but error is back propagated, Hence the name Back Propagation Network (BPN). It
uses Supervised Training process; it has a systematic procedure for training the network and
is used in Error Detection and Correction. Generalized Delta Law /Continuous Perceptron
Law/ Gradient Descent Law is used in this network. Generalized Delta rule minimizes the
mean squared error of the output calculated from the output. Delta law has faster convergence
rate when compared with Perceptron Law. It is the extended version of Perceptron Training
Law. Limitations of this law is the Local minima problem. Due to this the convergence speed
reduces, but it is better than perceptron’s. Figure 1 represents a BPN network architecture.
Even though Multi level perceptron’s can be used they are flexible and efficient that BPN. In
figure 1 the weights between input and the hidden portion is considered as Wij and the weight
between first hidden to the next layer is considered as Vjk. This network is valid only for
Differential Output functions. The Training process used in backpropagation involves three
stages, which are listed as below
1. Feedforward of input training pair
2. Calculation and backpropagation of associated error
3. Adjustments of weights

Figure 1: Back Propagation Network

BPN Algorithm

The algorithm for BPN is as classified int four major steps as follows:
1. Initialization of Bias, Weights
2. Feedforward process
Module-II Deep Learning
3. Back Propagation of Errors
4. Updating of weights & biases
Algorithm:
I. Initialization of weights:
Step 1: Initialize the weights to small random values near zero
Step 2: While stop condition is false , Do steps 3 to 10
Step 3: For each training pair do steps 4 to 9
II. Feed forward of inputs
Step 4: Each input xi is received and forwarded to higher layers (next hidden)
Step 5: Hidden unit sums its weighted inputs as follows Zinj = Woj + Σxiwij
Applying Activation function Zj = f(Zinj)
This value is passed to the output layer
Step 6: Output unit sums it’s weighted inputs
yink= Voj + Σ ZjVjk
Applying Activation function
Yk = f(yink)
III. Backpropagation of Errors
Step 7: δk = (tk – Yk)f(yink )
δinj = Σ δjVjk
IV. Updating of Weights & Biases
Step 8: Weight correction is
Δwij = αδkZj
bias Correction is
Δwoj = αδk
V. Updating of Weights & Biases
Step 9: continued:
New Weight is
Wij(new) = Wij(old) + Δwij Vjk(new) = Vjk(old) + ΔVjk
New bias is
Woj(new) = Woj(old) + Δwoj Vok(new) = Vok(old) + ΔVok
Module-II Deep Learning

Step 10: Test for Stop Condition

Key Concepts in Backpropagation:

 Gradient Descent: The algorithm for minimizing the loss function by iteratively
updating the weights based on the gradient of the loss.
 Learning Rate: A hyperparameter that controls the size of the steps taken during
weight updates. If it’s too large, the model may fail to converge; if too small, training
can be too slow.
 Activation Functions: Functions like sigmoid, ReLU, or tanh, which add non-
linearity to the model and determine the output of each neuron.

Regularization

Regularization is a set of techniques used to prevent overfitting in neural networks, where


the model learns to perform very well on the training data but poorly on new, unseen data.
Overfitting occurs when the model becomes too complex and captures noise in the data
instead of general patterns.

Several regularization techniques are used to improve the generalization of a model:

1. L1 and L2 Regularization:

These are the most common forms of regularization, which work by adding a penalty term to
the loss function that discourages large weight values.

 L2 Regularization (Ridge): Adds the sum of the squared weights to the loss
function:

This leads to smaller weights, helping to reduce model complexity without

eliminating them altogether.

 L1 Regularization (Lasso): Adds the sum of the absolute values of the weights to the
loss

function:
Module-II Deep Learning
This can lead to sparsity, meaning some weights become exactly zero, effectively
reducing the number of features used by the model.

 L1/L2 Combined (Elastic Net): A combination of both, striking a balance between


penalizing large weights and encouraging sparsity.

λ is the regularization strength. A larger value increases the penalty on the weights.

2. Dropout:

Dropout is a technique that randomly “drops” or ignores a subset of neurons during each
forward and backward pass during training. This prevents the network from becoming too
reliant on any one particular neuron, promoting robustness and reducing overfitting.

 For each neuron in a given layer, a probability ppp (e.g., 0.5) is used to determine
whether it will be ignored (dropped) during training.
 During inference (when the model is being used for predictions), all neurons are used,
but their outputs are scaled by 1−p1 - p1−p to balance their contribution.

3. Early Stopping:

Early stopping is a form of regularization that monitors the model’s performance on a


validation set during training and stops training when performance on the validation set
begins to deteriorate. This helps to prevent overfitting by halting training before the model
becomes too specialized to the training data.

4. Batch Normalization:

Batch normalization normalizes the inputs of each layer (specifically, the mini-batches)
during training. This can improve training speed and stability while also having a slight
regularization effect by introducing a small amount of noise in the mini-batch estimates.

5. Data Augmentation:

Instead of modifying the model, data augmentation involves modifying the data to introduce
variability, which reduces overfitting. For example, in image classification tasks,
transformations like rotation, flipping, or cropping can artificially increase the size of the
training set, making the model more robust.

6. Weight Constraints:

Constraints can be applied on the values of weights to keep them within a specific range
during training, limiting the capacity of the model and making it less prone to overfitting.

How Regularization and Backpropagation Work Together

 Backpropagation calculates gradients that guide how to update weights to minimize error,
while regularization modifies the loss function to penalize overly complex models (with
large weights), improving generalization.
Module-II Deep Learning
 For example, in L2 regularization, backpropagation will not only minimize the prediction
error but also attempt to minimize the sum of squared weights. This keeps the model simpler
and less prone to overfitting.

Together, backpropagation and regularization form a powerful framework for training neural
networks that generalize well to unseen data.

Batch normalization- VC Dimension and Neural Nets

Batch normalization is a technique used in training deep neural networks to improve both
speed and performance. It works by normalizing the inputs of each layer so that they have a
stable distribution throughout training, which addresses the problem of internal covariate
shift (where the distribution of layer inputs changes as the model learns).

How It Works:

1. Normalization: For each mini-batch during training, the mean and variance of the
layer’s inputs are computed. The inputs are then normalized to have a mean of 0 and a
variance of 1.

Formula for normalization:


Module-II Deep Learning

Scale and Shift: After normalization, the outputs are scaled and shifted by two learnable
parameters γ (scale) and β (shift). These parameters allow the model to adapt to different
input distributions and recover any necessary information that may have been lost during
normalization.

Benefits of Batch Normalization:

 Faster Training: It allows for higher learning rates, accelerating the training process.
 Improved Performance: Models often generalize better and overfit less, partly due to a
regularization effect from batch normalization.
 Reduced Internal Covariate Shift: It stabilizes the learning process by keeping the input
distributions consistent throughout the network layers.

In practice, batch normalization is typically inserted after the linear transformation of each
layer but before the activation function.

VC Dimension (Vapnik-Chervonenkis Dimension)

The VC dimension is a concept from statistical learning theory that measures the capacity or
complexity of a model. Specifically, it quantifies the ability of a model (like a classifier) to
shatter a dataset, meaning the ability of the model to correctly classify any arbitrary labeling
of data points.

The VC dimension is important because it helps to assess the capacity of a learning model
to fit data, which impacts the model's ability to generalize.

Definition of VC Dimension:

 For a model (or hypothesis class H) and a set of data points, the VC dimension is defined as
the largest number d such that the model can shatter any set of d points.
 "Shattering" means that for every possible binary labeling of the d points, the model can
perfectly classify them.
 A higher VC dimension means the model is more complex and has a greater capacity to fit
diverse patterns in the data.
Module-II Deep Learning
Example:

 Consider a simple model like a linear classifier in 2D space (i.e., a line that separates data
points). The VC dimension of this model is 3, because a line can classify any arbitrary
labeling of 3 points in a plane. However, with 4 points, it cannot shatter all possible labelings.

VC Dimension in Neural Networks

In the context of neural networks, the VC dimension becomes more complex due to the
flexibility and high expressiveness of deep neural architectures. Neural networks are
powerful function approximators and can theoretically shatter very large sets of data points,
leading to a high VC dimension.

Factors Influencing VC Dimension in Neural Networks:

1. Number of Parameters (Weights):


o Neural networks have a large number of parameters (weights and biases),
especially in deep architectures. The more parameters a model has, the more
complex functions it can approximate, leading to a higher VC dimension.
2. Network Depth and Width:
o Deeper networks (with more hidden layers) and wider networks (with more
neurons per layer) typically have higher VC dimensions, because they can
represent more complex decision boundaries.
3. Activation Functions:
o The choice of activation function (e.g., ReLU, sigmoid, tanh) influences the
types of functions the network can learn, which also impacts its capacity and
VC dimension.

VC Dimension of Neural Networks:

While calculating the exact VC dimension for a neural network is challenging, it's known that
the VC dimension of a neural network grows with its number of parameters. For a neural
network with W parameters, the VC dimension is typically proportional to W, meaning the
network can shatter about as many data points as it has weights.

 For a neural network with W weights, the VC dimension d is typically on the order of
O(W), meaning it is directly related to the number of parameters.
 This suggests that large neural networks (e.g., deep networks) have a high VC
dimension and, thus, a high capacity to fit complex datasets.

Implications for Generalization:

 A high VC dimension means that a neural network has high capacity and can fit very
complex datasets. However, this also increases the risk of overfitting, where the
network fits noise in the training data rather than capturing true underlying patterns.
 Regularization techniques (like L2 regularization, dropout, early stopping) are often
used to reduce the effective capacity of the network and improve generalization.
Module-II Deep Learning
Trade-off: Bias-Variance Trade-off

 The VC dimension is linked to the bias-variance trade-off. A model with a high VC


dimension may have low bias (able to fit complex functions) but high variance
(sensitive to noise in the training data).
 Models with a lower VC dimension might generalize better but could underfit the data
if the true underlying function is complex.

Shallow Network:

A shallow neural network typically consists of:

Input Layer: Receives the input features.

Hidden Layer(s): One or two layers where computations occur. Each hidden layer
consists of neurons that apply activation functions to the weighted sums of their
inputs.

Output Layer: Produces the final output, which can be a classification, regression
value, or another type of result.

Characteristics of Shallow Networks

Limited Depth: Generally has only one or two hidden layers.

Simplicity: Easier to understand and interpret compared to deep networks.

Faster Training: Requires less computational power and time to train.

Overfitting Risk: Less prone to overfitting with small datasets, but can underfit if the
problem is complex.

Applications

Shallow networks can be effective for:

Linear problems: Simple classification tasks (e.g., logistic regression).

Small datasets: Where the complexity of deep networks is unnecessary.

Feature extraction: When the features are well-defined and the relationships are not
too complex.

Limitations
Module-II Deep Learning
Performance: Struggles with complex tasks where deep learning excels, like image
and speech recognition.

Capacity: May not capture intricate patterns in data due to limited layers.

Deep Networks

Deep networks, often referred to as deep neural networks (DNNs), consist of multiple layers
of neurons, allowing them to learn complex patterns in data. Here’s an overview of their key
features, structure, and applications:

Structure of Deep Networks

Input Layer: Accepts the input data, which could be images, text, audio, etc.

Hidden Layers: Composed of many layers (often dozens or hundreds). Each layer
transforms the input through learned weights and activation functions.

Output Layer: Produces the final output, such as class probabilities in classification
tasks or continuous values in regression.

Characteristics of Deep Networks

Depth: The primary feature is the number of hidden layers, enabling the network to
learn hierarchical representations.

Complexity: Can model intricate relationships and patterns in data due to the high
number of parameters.

Feature Learning: Automatically learns features at various levels of abstraction,


from low-level edges in images to high-level concepts like objects or emotions.

Advantages

Performance: Often achieves state-of-the-art results in complex tasks like image


recognition, natural language processing, and more.

Flexibility: Can be adapted to a wide range of applications by adjusting the


architecture (e.g., convolutional layers for images, recurrent layers for sequences).

Transfer Learning: Pre-trained deep networks can be fine-tuned for specific tasks,
saving time and resources.

Applications

Computer Vision: Image classification, object detection, and segmentation.


Module-II Deep Learning
Natural Language Processing: Text classification, translation, and sentiment
analysis.

Speech Recognition: Voice assistants and transcription services.

Reinforcement Learning: Training agents to make decisions in dynamic


environments.

Challenges

Training Time: Requires significant computational resources and time to train


effectively, especially with large datasets.

Overfitting: Prone to overfitting, particularly with small datasets; regularization


techniques (like dropout) are often used.

Interpretability: More complex and harder to interpret than shallow networks, which
can be a drawback in critical applications.

Difference Between a Shallow Net & Deep Learning Net:

Sl.No Shallow Net’s Deep Learning Net’s

1 One Hidden layer(or very less no. of Deep Net’s has many layers of Hidden
Hidden Layers) layers with more no. of neurons in
each layers

2 Takes input only as VECTORS DL can have raw data like image, text
as inputs

3 Shallow net’s needs more parameters DL can fit functions better with less
to have better fit parameters than a shallow network

4 Shallow networks with one Hidden DL can compactly express highly


layer (same no of neurons as DL) complex functions over input space
cannot place complex functions over
the input space

5 The number of units in a shallow DL don’t need to increase it


network grows exponentially with size(neurons) for complex problems
task complexity.

6 Shallow network is more difficult to Training in DL is easy and no issue of


train with our current algorithms (e.g. local minima in DL
it has issues of local minima etc)

Convolution Networks
Module-II Deep Learning
Convolutional Neural Networks (CNNs) are a specialized type of deep neural
network primarily designed for processing structured grid data, such as images. They
excel in tasks involving image recognition, object detection, and other visual tasks
due to their unique architecture. Here’s an overview of CNNs, including their
structure, characteristics, and applications:

Structure of Convolutional Networks

Input Layer: Accepts input images, typically represented as a 3D array (height,


width, channels).

Convolutional Layers: Convolutional Operations: Apply convolution operations


using small filters (kernels) that slide over the input image. This process captures
local patterns (like edges or textures).

Activation Function: Often followed by a nonlinear activation function, such as


ReLU (Rectified Linear Unit), to introduce non-linearity.

Pooling Layers: Downsampling: Reduce the spatial dimensions of the feature maps,
retaining important information while reducing computational load. Common types
include max pooling and average pooling.

Fully Connected Layers: After several convolutional and pooling layers, the high-
level reasoning in the neural network is performed by fully connected layers, where
every neuron is connected to every neuron in the previous layer.

Output Layer: Produces the final output, such as class probabilities for classification
tasks.

Key Characteristics

Local Connectivity: Each neuron in a convolutional layer is connected only to a


small region of the input, allowing the network to focus on local features.

Parameter Sharing: Filters are applied across the entire input, reducing the number
of parameters and allowing the model to learn features that are spatially invariant.

Hierarchical Feature Learning: CNNs learn features at multiple levels of


abstraction, from simple edges to complex patterns like shapes and objects.

Advantages

Efficiency: Fewer parameters compared to fully connected networks, making them


faster to train and less prone to overfitting.

Translation Invariance: Capable of recognizing objects in images regardless of their


position, thanks to pooling layers and convolutions.
Module-II Deep Learning
Performance: Achieve state-of-the-art results in various computer vision tasks.

Applications

Image Classification: Identifying the main object in an image (e.g., cats vs. dogs).

Object Detection: Locating and identifying multiple objects within an image (e.g.,
YOLO, Faster R-CNN).

Semantic Segmentation: Classifying each pixel in an image to understand the scene


better (e.g., segmenting roads, cars, and pedestrians).

Facial Recognition: Identifying and verifying faces in images.

Generative Adversarial Networks (GANs)

Generative Adversarial Networks (GANs) are a class of machine learning frameworks


designed for generating new data samples that resemble a given dataset. Introduced by Ian
Goodfellow and his colleagues in 2014, GANs consist of two neural networks that compete
against each other in a game-theoretic setting. Here's a detailed overview:

Structure of GANs

1. Generator (G):
o The generator's role is to create fake data samples from random noise. It learns
to produce data that mimics the real data distribution.
o The generator starts with random input (often sampled from a Gaussian
distribution) and transforms it through several layers to produce an output
(e.g., an image).

2. Discriminator (D):
o The discriminator's role is to distinguish between real data samples (from the
training set) and fake samples generated by the generator.
o It outputs a probability score indicating whether a given sample is real or fake.

Training Process

1. Adversarial Game: The training process involves a two-player game:


o The generator tries to produce convincing fake data to fool the discriminator.
o The discriminator tries to accurately classify real and fake samples.

2. Loss Functions:
o The generator's loss is based on how well it can fool the discriminator.
o The discriminator's loss is based on how accurately it can distinguish between
real and fake data.
o These losses are typically computed using binary cross-entropy.

3. Iterative Updates:
Module-II Deep Learning
o During training, the generator and discriminator are updated alternately. The
generator improves its ability to create realistic samples, while the
discriminator enhances its ability to identify fakes.

Key Characteristics

 Unsupervised Learning: GANs are typically used in unsupervised learning settings,


where the model learns from unlabelled data.
 Mode Collapse: A potential issue where the generator produces a limited variety of
outputs instead of capturing the full diversity of the training data.
 Stability: Training GANs can be challenging due to instability; techniques like
Wasserstein GANs (WGANs) and Progressive Growing GANs have been proposed to
address these issues.

Applications

 Image Generation: Creating realistic images, art, or even enhancing low-resolution


images.
 Video Generation: Generating video frames or synthesizing motion.
 Data Augmentation: Generating additional training samples to improve model
performance, particularly in scenarios with limited data.
 Style Transfer: Applying the style of one image to the content of another (e.g., turning
photos into paintings).
 Text to Image Synthesis: Generating images based on textual descriptions.

Challenges

 Training Complexity: Achieving a balance between the generator and discriminator can
be difficult, often requiring careful tuning of hyperparameters.
 Evaluation: Assessing the quality of generated samples can be subjective; metrics like
Inception Score and Fréchet Inception Distance (FID) are used but have limitations.
 Ethical Concerns: The ability to generate realistic fake images can lead to misuse, such
as deepfakes and misinformation.

Semi-supervised learning

Semi-supervised learning is a machine learning approach that combines both labeled and
unlabeled data to improve the learning process. This method is particularly useful when
obtaining labeled data is expensive or time-consuming, while unlabeled data is more
abundant. Here’s a detailed overview:

Key Concepts

Labeled Data: Data that comes with corresponding output labels (e.g., images of cats labeled
as "cat").

Unlabeled Data: Data without associated labels (e.g., images of animals without any
annotations).
Module-II Deep Learning
Learning Objective: The goal is to leverage the unlabeled data to enhance the model’s
performance on the task, often achieving better results than using labeled data alone.

How It Works

Training Process:

Initial Training: The model is initially trained on the available labeled data.

Utilization of Unlabeled Data: The model is then used to make predictions on the unlabeled
data, generating pseudo-labels based on its confidence.

Iterative Refinement: The model is retrained using a combination of labeled and pseudo-
labeled data, refining its ability to generalize.

Techniques:

Consistency Regularization: Encourages the model to produce consistent outputs for the same
input under different perturbations (e.g., noise or transformations).

Generative Models: Some approaches involve using generative models (like GANs) to create
additional labeled data or to better understand the data distribution.

Graph-Based Methods: Use graph structures to model relationships between labeled and
unlabeled data points, leveraging their connectivity to propagate labels.

Advantages

Reduced Labeling Cost: Fewer labeled samples are needed, saving time and resources.

Better Generalization: Leveraging large amounts of unlabeled data can lead to improved
model generalization and performance.

Flexibility: Can be applied to various types of tasks, including classification, regression, and
clustering.

Applications

Image Classification: Using large sets of unlabeled images alongside a smaller set of labeled
images to improve classification accuracy.

Natural Language Processing: Tasks like sentiment analysis, where unlabeled text data can
enhance understanding.

Speech Recognition: Leveraging large amounts of unlabeled audio data to improve models
trained on smaller labeled datasets.
Module-II Deep Learning
Medical Imaging: Where labeling can be costly and requires expert knowledge; semi-
supervised methods can help utilize vast amounts of unlabeled scans.

You might also like