KEMBAR78
Module 2 | PDF | Deep Learning | Computational Neuroscience
0% found this document useful (0 votes)
51 views16 pages

Module 2

Uploaded by

chinnu.200420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views16 pages

Module 2

Uploaded by

chinnu.200420
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

BAI701

Module-2

Chapter 1: Basics of Supervised Deep Learning

2.1 Introduction
The use of supervised and unsupervised deep learning models has grown at a fast rate due to their
success with learning of complex problems. High-performance computing resources, availability
of huge amounts of data (labeled and unlabeled) and state-of-the-art open-source libraries are
making deep learning more and more feasible for various applications.

2.2 Convolutional Neural Network (ConvNet/CNN)


Convolutional Neural Network also known as ConvNet or CNN is a deep learning technique that
consists of multiple numbers of layers. ConvNets are inspired by the biological visual cortex.
ConvNets have shown excellent performance on several applications such as image classification,
object detection, speech recognition, natural language process ing, and medical image analysis.
Convolutional neural networks are powering core of computer vision that has many applications
which include self-driving cars, robotics, and treatments for the visually impaired. The main
concept of ConvNets is to obtain local features from input (usually an image) at higher layers and
combine them into more complexfeatures at the lower layers. However, due toits multilayered
architecture, it is computationally exorbitant and training such networks on a large dataset takes
several days. Therefore, such deep networks are usually trained on GPUs. Convolutional neural
networks are so powerful on visual tasks that they outperform almost all the conventional methods.

2.3 Evolution of Convolutional Neural Network Models

LeNet

 The first practical convolutional neural network (CNN), designed to classify handwritten
digits (MNIST).
 Used backpropagation for training and was adopted for reading handwritten checks.
 Did not scale well to larger problems due to:
o Small labeled datasets

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 1


BAI701
o Slow computers
o Use of unsuitable activation functions (like sigmoid/tanh) leading to vanishing
gradients, which make training deep networks difficult.

AlexNet

 Achieved the first major breakthrough in 2012 by winning the ImageNet Large-Scale
Visual Recognition Challenge (ILSVRC).
 Reduced classification error rate from 26% to 15%.
 Improvements over LeNet includes:
o Large labeled image database (ImageNet), which contained around 15 million
labeled images from a total of over 22,000 categories, was used.
o The model was trained on high-speed GTX 580 GPUs for 5 to 6 days.
o Use of ReLU activation function (f(x) = max(x, 0)), which is faster and avoids
vanishing gradient problems.
 Architecture: 5 convolutional layers, 3 pooling layers, 3 fully connected layers, and a
1000-way softmax classifier.

ZFNet (2013):

o An improved version of CNN architecture by reducing the first-layer filter size


from 11×11 to 7×7 and stride from 4 to 2.
o This led to better feature extraction and fewer dead features.
o ZFNet won the ILSVRC 2013 competition.

VGGNet (2014):

o The depth of the network was made 19 layers by adding more convolutional
layers with 3 × 3 filters, along with 2 × 2 max-pooling layers with stride and
padding of 1 in all layers.
o The deeper, simpler architecture improved accuracy significantly.
o VGGNet achieved 7.32% error rate and was the runner-up in ILSVRC 2014.

GoogLeNet (2015):

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 2


BAI701
Google developed a ConvNet model called GoogLeNet in 2015. It uses an inception module which
helps in reducing the number of parameters in the network. The inception module is actually a
concatenated layer of convolutions (3 × 3 and 5 ×5convolutions) and pooling sub-layers at
different scales with their output filter banks concatenated into a single output vector making the
input for the succeeding stage. These sub-layers are not stacked sequentially but the sub-layers are
connected in parallel as shown in Fig. 2.1.

In order to compensate for additional computational complexity due to extra convolutional


operations, 1 × 1 convolution is used that results in reduced computations before expensive 3 × 3
and 5 × 5 convolutions are performed. GoogLeNet model has two convolutional layers, four max-
pooling layers, nine inception layers, and a softmax layer. The use of this special inception
architecture makes GoogLeNet to have 12 times lesser parameters than AlexNet.

Increasing network layers can improve accuracy by learning more features, but has limits:

1. Vanishing gradients: Very deep networks may lose important information during training.
2. Optimization difficulty: Too many parameters make training harder.

To address this, network depth should be increased carefully.

GoogLeNet won ILSVRC 2015 with a 6.7% error rate.

Later versions include Inception V3 (2016) and Inception-ResNet (2017).

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 3


BAI701
ResNet:

Microsoft Research Asia proposed a CNN architecture in 2015, which is, 152 layers deep and is
called ResNet. ResNet introduced residual connections in which the output of a conv-relu-conv
series is added to the original input and then passed through Rectified Linear Unit (ReLU) as
shown in Fig. 2.2. In this way, the information is carried from the previous layer to the next layer
and during backpropagation, the gradient flows easily because of the addition operations, which
distributes the gradient. ResNet proved that a complex architecture like Inception is not required
to achieve the best results but a simple and deep architecture can be tweaked to get better results.
Performed exceptionally well, winning ILSVRC 2015 with a 3.6% error rate, surpassing human-
level accuracy. Despite its depth, ResNet had fewer parameters than VGGNet.

Inception-ResNet (2017):

 Combined the Inception module with residual connections to form a hybrid model.
 This design significantly increased training speed.
 It slightly outperformed ResNet in terms of accuracy.

Xception:

A convolutional neural network architecture based on depthwise separable convolution layers


is called Xception. The architecture is actually inspired by inception model and that is why it
is called Xception (Extreme Inception). Xception architecture is a pile of depthwise separable

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 4


BAI701
convolution layers with residual connections. Xception has 36 convolutional layers organized
into 14 modules, all having linear residual connections around them, except for the first and
last modules. The Xception has claimed to perform slightly better than Inception V3 on
ImageNet. Table 2.1 and Fig. 2.3 show classification performance of VGG-16, ResNet-152,
Inception V3 and Xception on ImageNet.

SqueezeNet: Researchers developed SqueezeNet to reduce the size and complexity of


convolutional neural networks without sacrificing accuracy. The approach included pruning
small-weight parameters to create sparse models and retraining them. Additionally, SqueezeNet
adopted three main strategies to minimize parameters and computation:

 (a) Replacing 3 × 3 filters with 1 × 1 filters.


 (b) Reducing the number of input channels to 3 × 3 filters.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 5


BAI701
 (c) Delaying subsampling to later layers to preserve larger activation maps.

With these methods, SqueezeNet achieved AlexNet-level accuracy on ImageNet using 50 times
fewer parameters.

ShuffleNet: Another ConvNet architecture called ShuffleNet was introduced in 2017 for devices
with limited computational power, like mobile devices, without compromising on accuracy.
ShuffleNet used two ideas, pointwise group convolution and channel shuffle, to considerably
decrease the computational cost while main taining the accuracy.

2.4 Convolution Operation

Convolution is a mathematical operation performed on two functions and is written as (f * g),


where f and g are two functions. The output of the convolution operation for domain n is defined
as

For time-domain functions, n is replaced by t. The convolution operation is com mutative in


nature, so it can also be written as

Convolution operation is one of the important operations used in digital signal processing and is
used in many areas which includes statistics, probability, natural language processing, computer
vision, and image processing. It can be applied to a two-dimensional function by sliding one
function on top of another, multiplying and adding. Convolution operation can be applied to
images to perform various transformations; here, images are treated as two-dimensional functions.
An example of a two-dimensional filter, a two-dimensional input, and a two-dimensional feature
map is shown in Fig. 2.4. Let the 2D input (i.e., 2D image) be denoted by A, the 2Dfilter of size

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 6


BAI701
m ×n be denoted by K, and the 2D feature map be denoted by F. Here, the image A is convolved
with the filter K and produces the feature map F. This convolution operation is denoted by A*K
and is mathematically given as

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 7


BAI701
2.5 Architecture of CNN

Traditional Neural Network Limitations

 Fully connected layers connect every neuron in one layer to every neuron in the previous
layer.
 This dense connectivity does not scale well to large images.

Need for CNN

 CNNs are better for large images and data with grid-like structure (e.g., 1D time-series, 2D
images, 3D volumes, 4D videos).
 Designed to process structured data efficiently.

Key Features of CNNs

 (i) Local Receptive Field:


o Each neuron connects only to a small region of the input.
o Helps extract local features like edges, corners.
 (ii) Weight Sharing:
o Same filter (set of weights) is applied across all positions in the input.
o Reduces number of parameters and enables feature detection anywhere in the input.
o A typical convolutional neural network consists of the following layers:
• Convolutional layer
• Activation function layer (ReLU)
• Pooling layer
• Fully connected layer and
• Dropout layer
o These layers are stacked up to make a full ConvNet architecture. Convolutional and
activation function layers are usually stacked together followed by an optional
pooling layer. Fully connected layer makes up the last layer of the network, and the
output of the last fully connected layer produces the class scores of the input image.
In addition to these main layers mentioned above, ConvNet may include optional

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 8


BAI701
layers like batch normalization layer to improve the training time and dropout layer
to address the overfitting issue
 (iii) Subsampling (Pooling):
o Reduces spatial size and network parameters.
o Most common method is max-pooling.

2.5.1 Convolution Layer

 The convolution layer is the main building block of a convolutional neural network (CNN).
 It uses the convolution operation (denoted by *) instead of general matrix multiplication.
 It has a set of learnable filters or kernels as its parameters.
 Its main task is to detect features in local regions of the input image that are common across
the dataset.
 A feature map is created for each filter by convolving it over subregions of the image.
 The process includes performing the convolution, adding a bias term, and applying an
activation function.
 The local receptive field is the region of the input the filter is applied to, and its size matches
the filter size.
 Figure 2.5 illustrates how a T-shaped filter is convolved with the input to get the feature
map.
 After adding the bias, a nonlinear activation function is applied to introduce nonlinearity
into the model.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 9


BAI701

Filters/Kernels
 The weights in each convolutional layer define the convolution filters (kernels)
 There can be multiple filters in a single convolutional layer.
 Each filter is designed to capture specific features like edges or corners.
 During the forward pass, each filter slides over the input’s width and height to produce its
feature map.
Hyperparameters
 Convolutional neural networks have hyperparameters that control model behavior,
output size, runtime, and memory.
 Four important hyperparameters in the convolution layer are:

 Filter Size: Typically between 3×3 and 11×11. Size is independent of input size.
 Number of Filters: Can vary. For example, AlexNet used 96 filters of size 11×11,
VGGNet used filters of size 7×7 or 11×11.
 Stride: Number of pixels the filter moves at each step. Small stride = more overlap and
larger output size; large stride = less overlap and smaller output size.
 Zero Padding: Number of pixels added as zeros around the input to control the output’s
spatial size.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 10


BAI701
Each filter in the convolution layer produces a feature map of size ([A−K +2P]/S) + 1 where A is
the input volume size, K is the size of the filter, P is the number of padding applied and S is the
stride. Suppose the input image has size 128 × 128, and 5 filters of size 5 × 5 are applied, with
single stride and zero padding, i.e., A 128, F 5,P 0andS 1.Thenumberoffeaturemapsproduced will
be equal to the number of filters applied, i.e., 5 and the size of each feature map will be ([128 − 5
+0]/1)+1 124. Therefore, the output volume will be 124 × 124 × 5.

2.5.2 Activation Function (ReLU)

 The output of each convolutional layer is passed through an activation function layer.
 The activation function transforms the feature map into an activation map.
 It determines the output signal of a neuron for a given input.
 Activation functions typically squash inputs to a specific range (e.g., 0–1 or −1 to 1).
 They perform a mathematical operation on the input to produce the neuron's activation
level.
 A good activation function is usually continuous and differentiable everywhere.
 Differentiability is important for gradient-based training methods used in ConvNets.
 If non-gradient-based methods are used, differentiability is not required.
 Many activation functions are used in ANNs and some of the commonly used activation
functions are as follows:

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 11


BAI701

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 12


BAI701

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 13


BAI701
2.5.3 Pooling Layer
 Pooling layers follow the convolution and activation layers in ConvNets to reduce the
spatial size of feature maps.
 This reduction lowers the number of parameters and computational cost in the network.
 A pooling layer down-samples the input feature maps by summarizing regions of neurons
to select representative values.
 Max-pooling is the most common technique, dividing the input into small regions (e.g.,
2 × 2) and selecting the maximum value from each region.
 For a 2 × 2 region, max-pooling outputs the single highest value among the four values.
 Other pooling types include average pooling (computes the mean of the region) and L2-
norm pooling (calculates the square root of the sum of squares of the values).
 Pooling layers discard less important details while preserving essential features in a smaller,
more manageable form.
 The idea behind pooling is that detecting a feature is more important than knowing its exact
location.
 This strategy works well for simple tasks but can have limitations for more complex
problems.

2.5.4 Fully Connected Layer


• Convolutional Neural Networks (CNNs) consist of two main stages: feature extraction and
classification.
• The feature extraction stage includes convolution and pooling layers that detect features
from input data.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 14


BAI701
• Once enough features are extracted, the classification stage begins.
• The classification stage consists of one or more fully connected layers followed by a
classifier.
• Fully connected layers take input from all neurons of the previous layer, enabling every
value to contribute to the prediction.
• These layers transform the spatial feature data into class scores or probabilities.
• Multiple fully connected layers can be used to learn complex feature relationships.
• The output from the last fully connected layer is sent to a classifier.
• Common classifiers used are Softmax and Support Vector Machines (SVMs).
• The Softmax classifier outputs class probabilities that sum to 1.
• The SVM classifier outputs class scores, and the class with the highest score is selected.
2.5.5 Dropout
 Deep neural networks have multiple hidden layers that help learn complex features.
 These are followed by fully connected layers used for decision-making.
 Fully connected layers are prone to overfitting due to their dense connections.
 Overfitting occurs when the model performs well on training data but poorly on new,
unseen data.
 To address overfitting, a dropout layer is used during training.
 Dropout randomly removes some neurons and their connections from the network during
each training iteration.
 The remaining reduced network is trained on the data at that stage.
 Dropped-out neurons are reinserted later with their original weights.
 This technique reduces overfitting and enhances the model's ability to generalize.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 15


BAI701
2.6 Challenges and Future Research Direction:

 Strong Performance: Convolutional Neural Networks (ConvNets) have shown excellent


results in tasks like object classification and detection, sometimes matching human-level
accuracy.
 Vulnerabilities Exist: Despite their success, ConvNets are vulnerable to small,
imperceptible changes in input images, which can lead to incorrect classifications.
 Cause of Vulnerability: One key reason for this vulnerability is the pooling operation,
which reduces the feature space but also discards important spatial information.
 Loss of Spatial Relationships: ConvNets detect if a feature is present in a region but fail
to capture the exact spatial relationships between features, making it harder to recognize
complex objects.
 Reliability Concern: These limitations raise concerns about the generalization and
reliability of ConvNets in real-world applications.
 Capsule Networks as a Solution: Capsule Networks have been proposed to overcome
some of these issues. They use capsules (groups of neurons) to represent objects and their
parts more precisely.
 Dynamic Routing: Instead of max pooling, Capsule Networks use dynamic routing to
preserve spatial relationships between features across layers.
 Ongoing Research: Capsule Networks are still in the early stages of research, and their
effectiveness across various visual tasks remains under investigation.

Pooja R Rao, Asst. Professor, Dept of CSE(DS), RNSIT. Page 16

You might also like