KEMBAR78
Unit II | PDF | Deep Learning | Support Vector Machine
0% found this document useful (0 votes)
14 views12 pages

Unit II

Uploaded by

renuka.ai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views12 pages

Unit II

Uploaded by

renuka.ai
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

1.

Shallow Neural Networks

a shallow neural network has only one (or just a few) hidden layers between the input and
output layers. The input layer receives the data, the hidden layer(s) process it, and the final
layer produces the output.

Shallow neural networks are simpler, more easily trained, and have greater computational
efficiency than deep neural networks, which may have thousands of hidden units in dozens
of layers. Shallow networks are typically used for simpler tasks such as linear regression,
binary classification, or low-dimensional feature extraction.

Logistic Regression

Logistic regression is a shallow supervised ML technique most commonly used to


solve classification problems, especially where the outcome is binary, such as (A or B), (yes
or no), and (malignant or benign).

At the heart of logistic regression lies the logistic function, f(x) = 1 / (1 + e−x), which
has a sigmoidal shape and returns a value between 0 and 1 for all inputs x.

Support Vector Machine (SVM)


In a SVM data is segregated into two classes, each represented by points in space
separated by as large a distance as possible. A dividing boundary separates the classes. The
choice of the boundary is taken as the one that maximizes the distance (the "margin")
between the boundary and the closest point in each group.

Random Forest

The Random Forest is a machine learning technique for classification and prediction
of data. The building block of the Random Forest is the Decision Tree.

Cluster Analysis

Unlike the three supervised machine learning techniques above, Cluster Analysis
is unsupervised. Its goal is to subdivide large data sets into clusters, groups of objects that
have similar properties or features compared to other groups.

Popular clustering methods used imaging applications include:


 K-means Clustering
 Connectivity-Based Clustering
 Gaussian-Mixture Clustering
 Density-Based Clustering

2.Deep Neural Network

A Convolutional Neural Network (CNN) is a type of Deep Learning neural


network architecture commonly used in Computer Vision. Computer vision is a field of
Artificial Intelligence that enables a computer to understand and interpret the image or
visual data.
CNN Architecture using Convolutional layers

In a regular Neural Network there are three types of layers:

1. Input Layers: It’s the layer in which we give input to our model. The number of
neurons in this layer is equal to the total number of features in our data (number
of pixels in the case of an image).

2. Hidden Layer: The input from the Input layer is then fed into the hidden layer.
There can be many hidden layers depending on our model and data size. Each
hidden layer can have different numbers of neurons which are generally greater
than the number of features. The output from each layer is computed by matrix
multiplication of the output of the previous layer with learnable weights of that
layer and then by the addition of learnable biases followed by activation function
which makes the network nonlinear.

3. Output Layer: The output from the hidden layer is then fed into a logistic
function like sigmoid or softmax which converts the output of each class into the
probability score of each class.
CNN Simple architecture

4. Hyperparameter tuning is the process of selecting the optimal values for a machine
learning model’s hyperparameters. Hyperparameters are settings that control the
learning process of the model, such as the learning rate, the number of neurons in a
neural network, or the kernel size in a support vector machine. The goal of
hyperparameter tuning is to find the values that lead to the best performance on a
given task.
5. Batch Normalization (BN) is a powerful technique that addresses these issues by
stabilizing the learning process and accelerating convergence. Batch
Normalization(BN) is a popular technique used in deep learning to improve the
training of neural networks by normalizing the inputs of each layer.
6. The XOR problem is a classic problem in artificial intelligence and machine
learning. XOR, which stands for exclusive OR, is a logical operation that takes
two binary inputs and returns true if exactly one of the inputs is true. The XOR
gate follows a specific truth table, where the output is true only when the inputs
differ. This problem is particularly interesting because a single-layer perceptron,
the simplest form of a neural network, cannot solve it.

7. Backpropagation Process in Deep Neural

Backpropagation is one of the important concepts of a neural network. Our task is to


classify our data best. For this, we have to update the weights of parameter and bias, but how
can we do that in a deep neural network? In the linear regression model, we use gradient
descent to optimize the parameter. Similarly here we also use gradient descent algorithm
using Backpropagation.

For a single training example, Backpropagation algorithm calculates the gradient of


the error function. Backpropagation can be written as a function of the neural network.
Backpropagation algorithms are a set of methods used to efficiently train artificial neural
networks following a gradient descent approach which exploits the chain rule.

The main features of Backpropagation are the iterative, recursive and efficient method
through which it calculates the updated weight to improve the network until it is not able to
perform the task for which it is being trained. Derivatives of the activation function to be
known at network design time is required to Backpropagation.

Now, how error function is used in Backpropagation and how Backpropagation works? Let
start with an example and do it mathematically to understand how exactly updates the weight
using Backpropagation.

Input values

X1=0.05
X2=0.10

Initial weight

W1=0.15 w5=0.40
W2=0.20 w6=0.45
W3=0.25 w7=0.50
W4=0.30 w8=0.55

Bias Values

b1=0.35 b2=0.60

Target Values

T1=0.01
T2=0.99

Now, we first calculate the values of H1 and H2 by a forward pass

Forward Pass

To find the value of H1 we first multiply the input value from the weights
as

H1=x1×w1+x2×w2+b1
H1=0.05×0.15+0.10×0.20+0.35
H1=0.3775

To calculate the final result of H1, we performed the sigmoid function as

We will calculate the value of H2 in the same way as H1


H2=x1×w3+x2×w4+b1
H2=0.05×0.25+0.10×0.30+0.35
H2=0.3925

To calculate the final result of H1, we performed the sigmoid function as

Now, we calculate the values of y1 and y2 in the same way as we


calculate the H1 and H2.

To find the value of y1, we first multiply the input value i.e., the outcome
of H1 and H2 from the weights as

y1=H1×w5+H2×w6+b2
y1=0.593269992×0.40+0.596884378×0.45+0.60
y1=1.10590597

To calculate the final result of y1 we performed the sigmoid function as

We will calculate the value of y2 in the same way as y1


y2=H1×w7+H2×w8+b2
y2=0.593269992×0.50+0.596884378×0.55+0.60
y2=1.2249214

To calculate the final result of H1, we performed the sigmoid function as

Our target values are 0.01 and 0.99. Our y1 and y2 value is not matched
with our target values T1 and T2.

8.Activation Function

Activation functions add a nonlinear property to the neural network. This allows the
network to model more complex data. ReLU should generally be used as an activation
function in the hidden layers. In the output layer, the expected value range of the
predictions must always be considered.

Neural Network Components

1. Input Layer
2. Hidden Layer
3. Output Layer

Activation Function

The two main categories of activation functions are:

o Linear Activation Function


o Non-linear Activation Functions

Linear Activation Function

Non-linear Activation Function

The normal data input to neural networks is unaffected by the complexity or other factors.

Activation Function

o Linear Function

Equation: A linear function's equation, which is y = x, is similar to the eqn of a single


direction.

o Sigmoid Function

It is a functional that is graphed in a "S" shape.

A is equal to 1/(1 + e-x).

Non-linear in nature. Observe that while Y values are fairly steep, X values range from -2 to
2. To put it another way, small changes in x also would cause significant shifts in the value of
Y. spans from 0 to 1.
Tanh Function

The activation that consistently outperforms sigmoid function is known as tangent hyperbolic
function. It's actually a sigmoid function that has been mathematically adjusted. Both are
comparable to and derivable from one another.

Range of values: -1 to +1. non-linear nature

Equation:

max A(x) (0, x). If x is positive, it outputs x; if not, it outputs 0.

Value Interval: [0, inf]

o ReLU (Rectified Linear Unit) Activation Function

Currently, the ReLU is the activation function that is employed the most globally. Since
practically all convolutional neural networks and deep learning systems employ it.

The derivative and the function are both monotonic.

o Softmax Function

Although it is a subclass of the sigmoid function, the softmax function comes in handy when
dealing with multiclass classification issues.

Used frequently when managing several classes. In the output nodes of image classification
issues, the softmax was typically present. The softmax function would split by the sum of the
outputs and squeeze all outputs for each category between 0 and 1.
Gradient Descent in Machine Learning
Gradient Descent is known as one of the most commonly used optimization algorithms to
train machine learning models by means of minimizing errors between actual and expected
results. Further, gradient descent is also used to train Neural Networks.

The best way to define the local minimum or local maximum of a function using gradient
descent is as follows:

o If we move towards a negative gradient or away from the gradient of the function at
the current point, it will give the local minimum of that function.
o Whenever we move towards a positive gradient or towards the gradient of the
function at the current point, we will get the local maximum of that function.

Types of Gradient Descent

Based on the error in various training models, the Gradient Descent learning algorithm can be
divided into Batch gradient descent, stochastic gradient descent, and mini-batch
gradient descent. Let's understand these different types of gradient descent:

1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each point in the training set and
update the model after evaluating all training examples. This procedure is known as the
training epoch. In simple words, it is a greedy approach where we have to sum over all
examples for each update.

2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that runs one training
example per iteration

3. Mini Batch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient descent and stochastic
gradient descent. It divides the training datasets into small batch sizes then performs the
updates on those batches separately.

You might also like