KEMBAR78
Unit 5 Notes | PDF | Eigenvalues And Eigenvectors | Artificial Neural Network
0% found this document useful (0 votes)
15 views20 pages

Unit 5 Notes

Uploaded by

mdwasiullah2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views20 pages

Unit 5 Notes

Uploaded by

mdwasiullah2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

SRM INSTITUTE OF SCIENCE AND TECHNOLOGY

NCR CAMPUS, MODINAGAR


DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Deep Learning Techniques


SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Eigen Vectors, Eigen Value, Single Value Decomposition


Eigenvalues and eigenvectors are concepts from linear algebra that are used to analyze and
understand linear transformations, particularly those represented by square matrices. They are
used in many different areas of mathematics, including machine learning and artificial
intelligence. In machine learning, eigenvalues and eigenvectors are used to represent data, to
perform operations on data, and to train machine learning models. In artificial intelligence,
eigenvalues and eigenvectors are used to develop algorithms for tasks such as image
recognition, natural language processing, and robotics.

1. Eigenvalue (λ): An eigenvalue of a square matrix A is a scalar (a single number) λ


such that there exists a non-zero vector v (the eigenvector) for which the following
equation holds:

Av = λv

In other words, when you multiply the matrix A by the eigenvector v, you get a new vector
that is just a scaled version of v (scaled by the eigenvalue λ).

2. Eigenvector: The vector v mentioned above is called an eigenvector corresponding to


the eigenvalue λ. Eigenvectors only change in scale (magnitude) when multiplied by
the matrix A; their direction remains the same. Mathematically, to find eigenvalues
and eigenvectors, you typically solve the following equation for λ and v:
(A — λI)v = 0

Where

 A is the square matrix for which you want to find eigenvalues and
eigenvectors.
 λ is the eigenvalue you’re trying to find.
 I is the identity matrix (a diagonal matrix with 1s on the diagonal and 0s
elsewhere).
 v is the eigenvector you’re trying to find.

Solving this equation involves finding the values of λ that make the matrix (A — λI) singular
(i.e., its determinant is zero), and then finding the corresponding v vectors.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Use of Eigenvalues and Eigenvectors in Machine Learning and AI:

 Dimensionality Reduction (PCA): In Principal Component Analysis (PCA), you


calculate the eigenvectors and eigenvalues of the covariance matrix of your data. The
eigenvectors (principal components) with the largest eigenvalues capture the most
variance in the data and can be used to reduce the dimensionality of the dataset while
preserving important information.
 Image Compression: Eigenvectors and eigenvalues are used in techniques like
Singular Value Decomposition (SVD) for image compression. By representing
images in terms of their eigenvectors and eigenvalues, you can reduce storage
requirements while retaining essential image features.
 Support vector machines: Support vector machines (SVMs) are a type of machine
learning algorithm that can be used for classification and regression tasks. SVMs
work by finding a hyperplane that separates the data into two classes. The eigenvalues
and eigenvectors of the kernel matrix of the SVM can be used to improve the
performance of the algorithm.
 Graph Theory: Eigenvectors play a role in analyzing networks and graphs. They can
be used to find important nodes or communities in social networks or other
interconnected systems.
 Natural Language Processing (NLP): In NLP, eigenvectors can help identify the most
relevant terms in a large document-term matrix, enabling techniques like Latent
Semantic Analysis (LSA) for document retrieval and text summarization.
 Machine Learning Algorithms: Eigenvalues and eigenvectors can be used to analyze
the stability and convergence properties of machine learning algorithms, especially in
deep learning when dealing with weight matrices in neural networks.

Example of Eigen Value and Eigen Vector:

Principal Component Analysis (PCA): PCA is a widely used dimensionality reduction


technique in machine learning and data analysis. It employs eigenvectors and eigenvalues
to reduce the number of features while retaining as much information as possible.
Imagine a dataset is with two variables, X and Y, and have to reduce it to one dimension.
Need to calculate the covariance matrix of data and find its eigenvectors and eigenvalues.
obtain the following:

 Eigenvalue 1 (λ₁) = 5
 Eigenvalue 2 (λ₂) = 1
 Eigenvector 1 (v₁) = [0.8, 0.6]
 Eigenvector 2 (v₂) = [-0.6, 0.8]
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Here are some eigenvector examples.

Singular Value Decomposition


The singular value decomposition is the process of factorizing a matrix A into the product of
three matrices A = UDVT. The column of U and V is orthonormal, and the matrix D is
diagonal with accurate positive entries, known as singular value decomposition. The
decomposition is valid for a variety of activities. First, data matrix A is often near a low-rank
matrix, and it is essential to locate a low-rank matrix that is a reasonable approximation to the
data matrix. We can extract the matrix B of rank k that best approximates A in the singular
value decomposition; we can do this for any rank k.

Generative Adversarial Networks


Generative Adversarial Networks (GANs) are a powerful class of neural networks that are used for
unsupervised learning. GANs are made up of two neural networks, a discriminator and a generator.
They use adversarial training to produce artificial data that is identical to actual data.

A generative adversarial network (GAN) is a deep learning architecture. It trains two neural networks
to compete against each other to generate more authentic new data from a given training dataset. For
instance, you can generate new images from an existing image database or original music from a
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

database of songs. A GAN is called adversarial because it trains two different networks and pits them
against each other. One network generates new data by taking an input data sample and modifying it
as much as possible. The other network tries to predict whether the generated data output belongs in
the original dataset. In other words, the predicting network determines whether the generated data is
fake or real. The system generates newer, improved versions of fake data values until the predicting
network can no longer distinguish fake from original.

 The Generator attempts to fool the Discriminator, which is tasked with accurately
distinguishing between produced and genuine data, by producing random noise samples.
 Realistic, high-quality samples are produced as a result of this competitive interaction, which
drives both networks toward advancement.
 GANs are proving to be highly versatile artificial intelligence tools, as evidenced by their
extensive use in image synthesis, style transfer, and text-to-image synthesis.
 They have also revolutionized generative modeling.

Through adversarial training, these models engage in a competitive interplay until the generator
becomes adept at creating realistic samples, fooling the discriminator approximately half the time.

Generative Adversarial Networks (GANs) can be broken down into three parts:

Generative: To learn a generative model, which describes how data is generated in terms of a
probabilistic model.

Adversarial: The word adversarial refers to setting one thing up against another. This means that, in
the context of GANs, the generative result is compared with the actual images in the data set. A
mechanism known as a discriminator is used to apply a model that attempts to distinguish between
real and fake images.

Networks: Use deep neural networks as artificial intelligence (AI) algorithms for training purposes.

Types of GANs
Vanilla GAN: This is the simplest type of GAN. Here, the Generator and the Discriminator are
simple basic multi-layer perceptrons. In vanilla GAN, the algorithm is really simple, it tries to
optimize the mathematical equation using stochastic gradient descent.

Conditional GAN (CGAN): CGAN can be described as a deep learning method in which some
conditional parameters are put into place. In CGAN, an additional parameter ‘y’ is added to the
Generator for generating the corresponding data. Labels are also put into the input to the
Discriminator in order for the Discriminator to help distinguish the real data from the fake generated
data.

Deep Convolutional GAN (DCGAN): DCGAN is one of the most popular and also the most
successful implementations of GAN. It is composed of ConvNets in place of multi-layer perceptrons.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

The ConvNets are implemented without max pooling, which is in fact replaced by convolutional
stride. Also, the layers are not fully connected.

Laplacian Pyramid GAN (LAPGAN): The Laplacian pyramid is a linear invertible image
representation consisting of a set of band-pass images, spaced an octave apart, plus a low-frequency
residual. This approach uses multiple numbers of Generator and Discriminator networks and different
levels of the Laplacian Pyramid. This approach is mainly used because it produces very high-quality
images. The image is down-sampled at first at each layer of the pyramid and then it is again up-scaled
at each layer in a backward pass where the image acquires some noise from the Conditional GAN at
these layers until it reaches its original size.

Super Resolution GAN (SRGAN): SRGAN as the name suggests is a way of designing a GAN in
which a deep neural network is used along with an adversarial network in order to produce higher-
resolution images. This type of GAN is particularly useful in optimally up-scaling native low-
resolution images to enhance their details minimizing errors while doing so.

How does a generative adversarial network work?


A generative adversarial network system comprises two deep neural networks—the generator network
and the discriminator network. Both networks train in an adversarial game, where one tries to generate
new data and the other attempts to predict if the output is fake or real data.

Technically, the GAN works as follows. A complex mathematical equation forms the basis of the
entire computing process, but this is a simplistic overview:

1. The generator neural network analyzes the training set and identifies data attributes
2. The discriminator neural network also analyzes the initial training data and distinguishes
between the attributes independently
3. The generator modifies some data attributes by adding noise (or random changes) to certain
attributes
4. The generator passes the modified data to the discriminator
5. The discriminator calculates the probability that the generated output belongs to the original
dataset
6. The discriminator gives some guidance to the generator to reduce the noise vector
randomization in the next cycle

The generator attempts to maximize the probability of mistake by the discriminator, but the
discriminator attempts to minimize the probability of error. In training iterations, both the
generator and discriminator evolve and confront each other continuously until they reach an
equilibrium state. In the equilibrium state, the discriminator can no longer recognize synthesized
data. At this point, the training process is over.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Backward Propagation Techniques


Backward propagation (also known as backpropagation) is a key algorithm used in deep learning to
train neural networks. It involves computing the gradients of the loss function with respect to the
weights of the network, and then updating the weights using these gradients in order to minimize the
loss.

Here are some of the techniques used in backward propagation:

Chain rule: The chain rule is a fundamental concept in calculus that is used to compute the gradients
of a function composed of multiple nested functions. In the context of neural networks, the chain rule
is used to compute the gradients of the loss function with respect to the weights of the network.

Gradient descent: Gradient descent is a method used to update the weights of the network using the
gradients computed in the previous step. There are several variants of gradient descent, including
stochastic gradient descent, which uses random subsets of the training data to compute the gradients
and update the weights.

Activation functions: Activation functions are used to introduce nonlinearity into the network and
make it capable of modeling complex, nonlinear relationships. Common activation functions include
sigmoid, tanh, and ReLU.

Regularization: Regularization techniques are used to prevent overfitting, which can occur when the
network becomes too complex and fits the training data too closely. Common regularization
techniques include L1 and L2 regularization, which add penalties to the loss function based on the
magnitudes of the weights.

Dropout: Dropout is a technique used to prevent overfitting by randomly dropping out (setting to
zero) some of the activations in the network during training. This helps to prevent the network from
becoming too reliant on any one set of features.

Batch normalization: Batch normalization is a technique used to improve the stability and speed of
training by normalizing the activations in each batch of data before they are input to the next layer.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

By using these techniques, we can train deep neural networks that are capable of modeling complex,
high-dimensional relationships between inputs and outputs. These models can be used for a wide
range of applications, including image and speech recognition, natural language processing, and
recommendation systems.

Feed Forward Neural Networks


Feed-forward Neural Networks, also known as Deep feedforward Networks or Multi-layer
Perceptrons, are the focus of this article. For example, Convolutional and Recurrent Neural Networks
(which are used extensively in computer vision applications) are based on these networks. Search
engines, machine translation, and mobile applications all rely on deep learning technologies. It works
by stimulating the human brain in terms of identifying and creating patterns from various types of
input. A feedforward neural network is a key component of this fantastic technology since it aids
software developers with pattern recognition and classification, non-linear regression, and function
approximation. A feedforward neural network is a type of artificial neural network in which nodes’
connections do not form a loop. Often referred to as a multi-layered network of neurons, feedforward
neural networks are so named because all information flows in a forward manner only. The data
enters the input nodes, travels through the hidden layers, and eventually exits the output nodes. The
network is devoid of links that would allow the information exiting the output node to be sent back
into the network.

The purpose of feedforward neural networks is to approximate functions


There is a classifier using the formula y = f* (x).
This assigns the value of input x to the category y.
The feedfоrwаrd netwоrk will mар y = f (x; θ). It then memorizes the value of θ that most closely
approximates the function.
Types of Neural Network’s Layers
The following are the components of a feedforward neural network:

Layer of input: It contains the neurons that receive input. The data is subsequently passed on to the
next tier. The input layer’s total number of neurons is equal to the number of variables in the dataset.
Hidden layer: This is the intermediate layer, which is concealed between the input and output layers.
This layer has a large number of neurons that perform alterations on the inputs. They then
communicate with the output layer.
Output layer: It is the last layer and is depending on the model’s construction. Additionally, the
output layer is the expected feature, as you are aware of the desired outcome.
Neurons weights: Weights are used to describe the strength of a connection between neurons. The
range of a weight’s value is from 0 to 1.

Cost Function in Feedforward Neural Network


The cost function is an important factor of a feedforward neural network. Generally, minor
adjustments to weights and biases have little effect on the categorized data points. Thus, to determine
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

a method for improving performance by making minor adjustments to weights and biases using a
smooth cost function.

What is a Feed-Forward Neural Network and how does it work?

In its most basic form, a Feed-Forward Neural Network is a single-layer perceptron. A sequence of
inputs enters the layer and is multiplied by the weights in this model. The weighted input values are
then summed together to form a total. If the sum of the values is more than a predetermined threshold,
which is normally set at zero, the output value is usually 1, and if the sum is less than the threshold,
the output value is usually -1. The single-layer perceptron is a popular feed-forward neural network
model that is frequently used for classification. Single-layer perceptrons can also contain machine-
learning features.

Batch Normalization
Batch Normalization is a technique used to improve the training of deep neural networks. Introduced
by Sergey Ioffe and Christian Szegedy in 2015, batch normalization is used to normalize the inputs of
each layer in such a way that they have a mean output activation of zero and a standard deviation of
one. This normalization process helps to combat issues that deep neural networks face, such as
internal covariate shifts, which can slow down training and affect the network's ability to generalize
from the training data.

Understanding Internal Covariate Shift


Internal covariate shift refers to the change in the distribution of network activations due to the update
of weights during training. As deeper layers depend on the outputs of earlier layers, even small
changes in the initial layers can amplify and lead to significant shifts in the distribution of inputs to
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

deeper layers. This can result in the need for lower learning rates and careful parameter initialization,
making the training process slow and less efficient.

How Batch Normalization Works

Batch normalization works by normalizing the output of a previous activation layer by subtracting the
batch mean and dividing by the batch standard deviation. After this step, the result is then scaled and
shifted by two learnable parameters, gamma, and beta, which are unique to each layer. This process
allows the model to maintain the mean activation close to 0 and the activation standard deviation
close to 1.

The normalization step is as follows:

1. Calculate the mean and variance of the activations for each feature in a mini-batch.
2. Normalize the activations of each feature by subtracting the mini-batch mean and dividing by
the mini-batch standard deviation.
3. Scale and shift the normalized values using the learnable parameters gamma and beta, which
allow the network to undo the normalization if that is what the learned behavior requires.

Batch normalization is typically applied before the activation function in a network layer, although
some variations may apply it after the activation function.

Benefits of Batch Normalization


Batch normalization offers several benefits to the training process of deep neural networks:

Improved Optimization: It allows the use of higher learning rates, speeding up the training process
by reducing the careful tuning of parameters.

Regularization: It adds a slight noise to the activations, similar to dropout. This can help to regularize
the model and reduce overfitting.

Reduced Sensitivity to Initialization: It makes the network less sensitive to the initial starting
weights.

Allows Deeper Networks: By reducing internal covariate shift, batch normalization allows for the
training of deeper networks.

Batch Normalization during Inference

While batch normalization is straightforward to apply during training, it requires special consideration
during inference. Since the mini-batch mean and variance are not available during inference, the
network uses the moving averages of these statistics that were computed during training. This ensures
that the normalization is consistent and the network's learned behavior is maintained.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Challenges and Considerations

Despite its benefits, batch normalization is not without challenges:

Dependency on Mini-Batch Size:

The effectiveness of batch normalization can depend on the size of the mini-batch. Very small batch
sizes can lead to inaccurate estimates of the mean and variance, which can destabilize the training
process.

Computational Overhead: Batch normalization introduces additional computations and parameters


into the network, which can increase the complexity and computational cost.

Sequence Data: Applying batch normalization to recurrent neural networks and other architectures
that handle sequence data can be less straightforward and may require alternative approaches.

Regularization
Regularization refers to techniques that are used to calibrate machine learning models in order to
minimize the adjusted loss function and prevent overfitting or underfitting. Using Regularization, we
can fit our machine learning model appropriately on a given test set and hence reduce the errors in it.

Regularization Techniques
There are two main types of regularization techniques: Ridge Regularization and Lasso
Regularization.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

1. Ridge Regularization:

A regression model that uses the L2 regularization technique is called Ridge regression.
Ridge regression adds the “squared magnitude” of the coefficient as a penalty term to the loss
function (L). Also known as Ridge Regression, it modifies the over-fitted or under fitted
models by adding the penalty equivalent to the sum of the squares of the magnitude of
coefficients. This means that the mathematical function representing our machine learning
model is minimized and coefficients are calculated. The magnitude of coefficients is squared
and added. Ridge Regression performs regularization by shrinking the coefficients present.
The function depicted below shows the cost function of ridge regression:

In the cost function, the penalty term is represented by Lambda λ. By changing the values of
the penalty function, we are controlling the penalty term. The higher the penalty, it reduces
the magnitude of coefficients. It shrinks the parameters. Therefore, it is used to prevent
multicollinearity, and it reduces the model complexity by coefficient shrinkage.

2. Lasso Regression :A regression model which uses the L1 Regularization technique is called
LASSO (Least Absolute Shrinkage and Selection Operator) regression. Lasso Regression
adds the “absolute value of magnitude” of the coefficient as a penalty term to the loss
function (L). Lasso regression also helps us achieve feature selection by penalizing the
weights to approximately equal to zero if that feature does not serve any purpose in the
model. It modifies the over-fitted or under-fitted models by adding the penalty equivalent to
the sum of the absolute values of coefficients. Lasso regression also performs coefficient
minimization, but instead of squaring the magnitudes of the coefficients, it takes the true
values of coefficients. This means that the coefficient sum can also be 0, because of the
presence of negative coefficients. Consider the cost function for Lasso regression:
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

3. Elastic Net Regression: This model is a combination of L1 as well as L2 regularization. That


implies that we add the absolute norm of the weights as well as the squared measure of the
weights. With the help of an extra hyperparameter that controls the ratio of the L1 and L2
regularization.
4. Data Augmentation (weak): Data augmentation involves increasing the size of the available
data set by augmenting them with more input created by random cropping, dilating,

rotating, adding a small amount of noise, etc as shown in the example figure below. The idea
is to artificially create more data in the hopes that the augmented dataset will be a better
representation of the underlying hidden distribution. Since we are limited by the available
dataset only, this method generally doesn’t work very well as a regularizer.

5. Dropout: Dropout is used when the training model is a neural network. A neural network
consists of multiple hidden layers, where the output of one layer is used as input to the
subsequent layer. The subsequent layer modifies the input through learnable parameters
(usually by multiplying it by a matrix and adding a bias followed by an activation function).
The input flows through the neural network layers until it reaches the final output layer,
which is used for prediction.
6. Early stopping: It is an optimization technique used to reduce overfitting without
compromising on model accuracy. The main idea behind early stopping is to stop training
before a model starts to overfit.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

How regularization works

Regularization works by adding a penalty term to the model’s loss function, which constrains large
parameter values. This constraint on parameter values helps prevent overfitting by reducing the
model’s complexity and promoting better generalization to new data.

How does regularization affect overfitting

Regularization helps to prevent overfitting by adding constrain to keep the model from not getting too
complicated and too closely fitting the training data. It makes the model better at making predictions
on new data.

Gradient descent: Gradient descent is a simple optimizer that computes the gradients of the loss
function with respect to the weights of the network and updates the weights in the opposite direction
of the gradients. This process is repeated iteratively until the loss is minimized. However, gradient
descent can be slow to converge, especially for large datasets and complex models.

Adaptive Gradients or AdaGrad Algorithm


Adaptive Gradients, or AdaGrad, is an extension of the gradient descent optimization algorithm that
allows the step size in each dimension used by the optimization algorithm to be automatically adapted
based on the gradients seen for the variable (partial derivatives) seen over the course of the search.
Gradient descent is an optimization algorithm that uses the gradient of the objective function to
navigate the search space. Gradient descent can be updated to use an automatically adaptive step size
for each input variable in the objective function, called adaptive gradients or AdaGrad. AdaGrad’s
concept is to modify the learning rate for every parameter in a model depending on the parameter’s
previous gradients. Specifically, it calculates the learning rate as the sum of the squares of the
gradients over time, one for each parameter. This reduces the learning rate for parameters with big
gradients while raising the learning rate for parameters with modest gradients. The idea behind this
particular method is that it enables the learning rate to adapt to the geometry of the loss function,
allowing it to converge quicker in steep gradient directions while being more conservative in flatter
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

gradient directions. This may result in quicker convergence and improved generalization. However,
this method has significant downsides. One of the most significant concerns is that the cumulative
gradient magnitudes may get quite big over time, resulting in a meager effective learning rate that can
inhibit further learning. Adam and RMSProp, two contemporary optimization algorithms, combine
their adaptive learning rate method with other strategies to limit the growth of gradient magnitudes
over time.

Two-Dimensional Test Problem


We will use a simple two-dimensional function that squares the input of each dimension and define
the range of valid inputs from -1.0 to 1.0. We can create a three-dimensional plot of the dataset to get
a feeling for the curvature of the response surface. The test problem is defined by the function f(x),
which takes in a two-dimensional input x and returns the output (x[0]∗∗2)+(2∗x[1]∗∗2). This function
has a global minimum at x = [0, 0].

Gradient Descent Optimization with AdaGrad


The optimization algorithm starts with some initial values of x, then iteratively improves these values
by following the negative gradient of the cost function and finding the optimal values of x using
gradient descent involves repeatedly computing the gradient of the cost function with respect to x, and
updating x in the direction that reduces the cost function. In this case, the gradient of the cost function
is computed using the grad_f(x) function, which returns the gradient [(2∗x[0]),(4∗x[1])]. The learning
rate is a hyperparameter that determines the size of the step taken to update x. In this case, the
learning rate is set to 0.1.

Visualization of AdaGrad
The optimization process is visualized by plotting the cost function and the trajectory of the
parameters during the optimization. The cost function is plotted by evaluating the functionf(x) on a
grid of points and using the contour function to plot the contours of the cost function. The trajectory
of the parameters is plotted by storing the values of x at each iteration and using the plot function to
plot the path of x. Finally, the plot is displayed using the show function.

Adadelta Algorithm
Adadelta (or “ADADELTA”) is an extension to the gradient descent optimization algorithm. Adadelta
is designed to accelerate the optimization process, e.g. decrease the number of function evaluations
required to reach the optima, or to improve the capability of the optimization algorithm, e.g. result in a
better final result. It is best understood as an extension of the AdaGrad and RMSProp algorithms.
AdaGrad is an extension of gradient descent that calculates a step size (learning rate) for each
parameter for the objective function each time an update is made. The step size is calculated by first
summing the partial derivatives for the parameter seen so far during the search, then dividing the
initial step size hyperparameter by the square root of the sum of the squared partial derivatives. The
idea behind Adadelta is that instead of summing up all the past squared gradients from 1 to “t” time
steps, what if we could restrict the window size. For example, computing the squared gradient of the
past 10 gradients and average out. This can be achieved using Exponentially Weighted Averages over
Gradient.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Adam optimizer
Adam optimizer is by far one of the most preferred optimizers. The idea behind Adam optimizer is to
utilize the momentum concept from “SGD with momentum” and adaptive learning rate from “Ada
delta”.

RMSprop: RMSprop is an optimization algorithm that uses adaptive learning rates for each weight
based on the average of the squares of the previous gradients. Tuning the RMSprop hyperparameters
involves finding values for the learning rate and decay rate that allow the optimizer to converge
quickly and reach a good solution.

Data Segmentation: Data segmentation is the process of breaking down a dataset into discrete
groups according to specific standards or attributes. These subsets can be identified by several criteria,
including behavior, demographics, or certain dataset features. Enabling more focused analysis and
modeling to produce better results is the main goal of data segmentation.

Segmentation plays a critical role in machine learning by enhancing the quality of data analysis and
model performance. Here’s why segmentation is important in the context of machine learning:

Improved Model Accuracy: Segmentation allows machine learning models to focus on specific
subsets of data, which often leads to more accurate predictions or classifications. By training models
on segmented data, they can capture nuances and patterns specific to each segment, resulting in better
overall performance.

Improved Understanding: Segmentation makes it possible to comprehend the data’s underlying


structure on a deeper level. Analysts can find hidden patterns, correlations, and trends in data by
grouping the data into meaningful categories that may not be visible when examining the data as a
whole. Having a deeper understanding can help with strategy formulation and decision-making.

Customized Solutions: Segmentation makes it easier to create strategies and solutions that are
specific to certain dataset segments. Personalized techniques have been shown to considerably
improve outcomes in a variety of industries, including marketing, healthcare, and finance. Segmented
patient data, for instance, enables customized treatment programs and illness management techniques
in the healthcare industry.

Optimized Resource Allocation: By segmenting data, organizations can allocate resources more
efficiently. For instance, in marketing campaigns, targeting specific customer segments with tailored
messages or offers can maximize the return on investment by focusing resources where they are most
likely to yield results.

Effective Risk Management: Segmentation aids in identifying high-risk segments within a dataset,
enabling proactive risk assessment and mitigation strategies. This is particularly crucial in fields like
finance and insurance, where accurately assessing risk can prevent financial losses.

Applications of Segmentation in Machine Learning


Machine learning uses segmentation techniques in a variety of domains:
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Customer Segmentation: Companies employ segmentation to put customers into groups according to
their preferences, buying habits, or demographics. This allows for more individualized advice,
focused marketing strategies, and happier customers.

Image segmentation: is a technique used in computer vision to divide images into objects or
meaningful regions. This makes performing tasks like scene comprehension, object detection, and
image classification possible.

Text Segmentation: Text segmentation in natural language processing is the process of breaking text
up into smaller chunks, like phrases, paragraphs, or subjects. This makes information retrieval,
sentiment analysis, and document summarization easier.

Healthcare Segmentation: To determine risk factors, forecast disease outcomes, and customize
treatment regimens, healthcare practitioners divide up patient data into smaller groups. Better patient
care and medical decision-making result from this.

Financial Segmentation: To provide specialized financial goods and services, banks and other
financial organizations divide up their clientele into groups according to credit risk, income levels,
and spending patterns. This aids in risk management and profitability maximization.

Problem Statements
While Machine learning is extensively used across industries to make data-driven decisions, its
implementation observes many problems that must be addressed. Here’s a list of organizations' most
common machine learning challenges when inculcating ML in their operations.

1. Inadequate Training Data


Data plays a critical role in the training and processing of machine learning algorithms. Many data
scientists attest that insufficient, inconsistent, and unclean data can considerably hamper the efficacy
of ML algorithms.

2. Underfitting of Training Data


This anomaly occurs when data fails to link the input and output variables explicitly. In simpler terms,
it means trying to fit in an undersized t-shirt. It indicates that data isn’t too coherent to forge a precise
relationship.

3. Overfitting of Training Data


Overfitting denotes an ML model trained with enormous amounts of data that negatively affects
performance. It's similar to trying an oversized jeans.

4. Delayed Implementation
ML models offer efficient results but consume a lot of time due to data overload, slow programs, and
excessive requirements. Additionally, they demand timely monitoring and maintenance to deliver the
best output.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Some problem statements that are often addressed by machine learning or data analytics:

Predictive Maintenance: How can machine learning algorithms be used to predict when equipment
or machinery will require maintenance, to minimize downtime and optimize operations?

Fraud Detection: How can data analytics be used to identify fraudulent transactions or behavior,
prevent financial losses, and protect customers?

Personalized Recommendations: How can machine learning be used to analyze customer data and
provide personalized recommendations, to improve customer experience and increase sales?

Image/Text/Speech Recognition: How can machine learning be used to recognize and classify
images, text, or speech, in order to enable applications such as autonomous vehicles, virtual assistants,
or medical diagnosis?

Forecasting: How can machine learning or data analytics be used to forecast future trends or events,
in order to enable better decision-making and planning?

Sentiment Analysis: How can machine learning be used to analyze customer sentiment, in order to
better understand customer needs and preferences, and improve brand reputation?

Data Engineering: Machine Learning in Data Engineering: Integrating ML techniques into data
engineering processes for enhanced data processing and analysis. Data engineering is the complex
task of making raw data usable to data scientists and groups within an organization. Data engineering
encompasses numerous specialties of data science. Incorporating machine learning within the realm of
data engineering involves a seamless fusion of essential processes: data preprocessing and the
construction of effective data pipelines.

Data Preprocessing for Machine Learning


Prior to engaging machine learning algorithms, data must undergo thorough preprocessing. This
entails a series of operations such as data cleaning, where inconsistencies and errors are rectified, and
data transformation, which involves converting data into a format suitable for analysis. Additionally,
feature engineering and selection are performed to identify the most relevant attributes that will
contribute to model performance. By meticulously preparing the data, the subsequent machine
learning processes can yield more accurate and meaningful insights.

Data Pipelines for Machine Learning Workflows


Data pipelines are the backbone of machine learning workflows, orchestrating the movement and
transformation of data from its raw form to a state ready for analysis. These pipelines typically
encompass stages such as data extraction, where data is collected from various sources; data
transformation, which involves reshaping and combining data; and data loading, where the processed
data is loaded into the target environment for analysis. In the context of machine learning, pipelines
ensure that data flows seamlessly through preprocessing, training, and evaluation stages, enabling
efficient model development and deployment.

Effectively integrating these components empowers data engineers to harness the power of machine
learning to derive insights and predictions from complex datasets, fostering a synergistic relationship
between the fields of data engineering and machine learning.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Model Selection
Model selection is an essential phase in the development of powerful and precise predictive models in
the field of machine learning. Model selection is the process of deciding which algorithm and model
architecture is best suited for a particular task or dataset. It entails contrasting various models,
assessing their efficacy, and choosing the one that most effectively addresses the issue at hand. The
choice of an appropriate machine learning model is crucial since there are various levels of
complexity, underlying assumptions, and capabilities among them. A model's ability to generalize to
new, untested data may not be as strong as its ability to perform effectively on a single dataset or
problem. Finding a perfect balance between the complexity of models & generalization is therefore
key to model selection. Choosing a model often entails a number of processes. The first step in this
process is to define a suitable evaluation metric that matches the objectives of the particular situation.
According to the nature of the issue, this statistic may refer to precision, recall, accuracy, F1-score, or
any other relevant measure. The selection of numerous candidate models is then made in accordance
with the problem at hand and the data that are accessible. These models might be as straightforward as
decision trees or linear regression or as sophisticated as deep neural networks, random forests, or
support vector machines. During the selection process, it is important to take into account the
assumptions, constraints, and hyperparameters that are unique to each model. Using a suitable
methodology, such as cross-validation, the candidate models are trained and evaluated after being
selected. To do this, the available data must be divided into validation and training sets, with each
model fitting on the training set before being evaluated on the validation set. The models are
compared using their performance metrics, then the model with the highest performance is chosen.
Model selection is a continuous process, though. In order to make wise selections, it frequently calls
for an iterative process that involves testing several models and hyperparameters. The models are
improved through this iterative process, which also aids in choosing the ideal mix of algorithms &
hyperparameters.

Model Engineering: Machine learning (ML) model engineering is a technical process that involves
various steps, including data collection and preprocessing, model selection, training, deployment and
monitoring. It aims to develop effective and efficient ML models to solve specific problems and meet
diverse business use cases. This process requires a range of technical skills, including expertise in
programming languages, data structures, algorithms and ML frameworks.

Model Outcome: During training, the machine learning algorithm is optimized to find certain
patterns or outputs from the dataset, depending on the task. The output of this process - often a
computer program with specific rules and data structures - is called a machine learning model.

Modal analysis: It is the process of determining the inherent dynamic characteristics of a system in
the forms of natural frequencies, damping factors, and mode shapes, and using them to formulate a
mathematical model for its dynamic behavior.

Optimization: The concept of optimization is integral to machine learning. Most machine learning
models use training data to learn the relationship between input and output data. The models can then
be used to make predictions about trends or classify new input data. This training is a process of
optimization, as each iteration aims to improve the model’s accuracy and lower the margin of error.
Optimization is a theme that runs through every step of machine learning. This includes a data
scientist optimizing and refining labeled training data or the iterative training and improvement of
models. At its core, the training of a machine learning model is an optimization problem, as the model
learns to perform a function most effectively. The most important part of machine learning
optimization is the tweaking and tuning of model configurations or hyperparameters.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
NCR CAMPUS, MODINAGAR
DEPARTMENT OF MCA
UDS21401J - DEEP LEARNING FOR ENTERPRISE
UNIT-V

Hyperparameters are the elements of the model set by the data scientist or developer. It includes
elements like the learning rate or number of classification clusters and is a way of refining a model to
fit a specific dataset. In contrast, parameters are elements developed by the machine learning model
itself during training. Selecting the optimal hyperparameters is key to ensuring an accurate and
efficient machine-learning model.

Data visualization: It is the graphical representation of information and data in a pictorial or


graphical format (Visualization of Data could be: charts, graphs, and maps). Data visualization tools
provide an accessible way to see and understand trends, patterns in data, and outliers. Data
visualization tools and technologies are essential to analyzing massive amounts of information and
making data-driven decisions. The concept of using pictures is to understand data that has been used
for centuries. General types of data visualization are Charts, Tables, Graphs, Maps, and Dashboards.

User Interface (UI): It defines the way humans interact with the information systems. In Layman’s
terms, User Interface (UI) is a series of pages, screens, buttons, forms, and other visual elements that
are used to interact with the device. Every app and every website has a user interface. User Interface
(UI) Design is the creation of graphics, illustrations, and the use of photographic artwork and
typography to enhance the display and layout of a digital product within its various device views.

You might also like