KEMBAR78
Convolutional Neural Networks : Popular Architectures | PDF
CNN Popular Architectures
and Transfer Learning
Palacode Narayana Iyer Anantharaman
16th Oct 2018
Motivation : Why study these?
Understanding popular architectures help us in many ways:
• We develop a better understanding of the subject by studying high performant
architectures
• This helps perform our own research on newer models
• Learning their design philosophy help us to design our models more effectively.
• Use them as a backbone for transfer learning, selecting the right architecture
References
https://leonardoaraujosantos.gitbooks.io/artificial-inteligence/
Current trend: Deeper Models work better
• CNNs consistently outperform other
approaches for the core tasks of CV
• Deeper models work better
• Increasing the number of parameters in layers
of CNN without increasing their depth is not
effective at increasing test set performance.
• Shallow models overfit at around 20 million
parameters while deep ones can benefit from
having over 60 million.
• Key insight: Model performs better when it is
architected to reflect composition of simpler
functions than a single complex function. This
may also be explained off viewing the
computation as a chain of dependencies
VGG Net
VGG net
ResNet
Recent Results (Credits: CS231n Stanford)
Resnet Motivation
ResNet Approach
• Generally: Deeper the better - See the fig
• Issue: Hard to train deeper networks effectively :
Validation error goes up not just due to
overfitting but increase in training error
• Solution: Use skip connections to propagate the
activations to reduce the impact
• Rationale: Imagine how to get an identity output
through a deep network accurately.
ResNet Hypothesis : Rationale
• Deeper models should perform at least as well as shallower models
• More parameters, more degrees of freedom to get the training error down
• More depth, more ability to model abstractions
• Solution by construction is copying the learned layers of a shallower
model and setting the additional layers to identity mapping
ResNet
• Very deep network: Uses 152 layers
• Shallower versions (e.g. Resnet50) are
available
• The “go to” backbone network for many
applications such as Faster RCNN
• Pre trained weights are available for
Keras, TensorFlow
ResNet Details
• Default input size: (224, 224, 3)
• Each stage reduces the width, height dimensions by a factor of 2
• This property is leveraged in later implementations such as pyramidal networks
ImageNet Results summary table
GoogleNet
• Design choices of filter sizes: (3, 3), (5, 5) and so on – Which to choose for each
convolution layer?
• Why not try all of them and choose the best?
• Trying each possible value on every layer and experimenting manually is not a solution
• Inception layer allows multiple filter sizes and learn their contributions through
parameters automatically
Inception Layer Naïve Architecture
Inception Layer Naïve Architecture (Fig Udacity Deep Learning)
Inception Architecture with bottleneck layer
Bottleneck Layer
Example
• Consider an input volume (28, 28, 192) and an output volume (28, 28, 32)
• How many computations are needed if we use a 5 x 5 filter?
• Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and
we have 32 of them
• 28 x 28 x 32 x 5 x 5 x 192 = 120M
• If we need multiple such filters, we need to add up corresponding computations
for each of them
• On a very deep network these many computations are prohibitively large even
when we use powerful hardware
• By reducing the dimensionality of the input before final convolutions, we get a
manageable number of computations
Example with bottleneck layer
• In our example, we can transform the 28 x 28 x 192 in to same sized surface but
much reduced depth (say 16) using 1 x 1 convolutions
• The bottleneck layer has the shape (28 x 28 x 16)
• Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate
the final output volume
• Computations: #computations between input to bottleneck layer +
#computations between bottleneck to output. In our example this is 12M
GoogleNet architecture with Inception layer
State of the art : SENet
• ImageNet 2017 topper in multiple categories
• A novel technique to weight the contributions of the channels of a
convolutional layer
SeNet : Squeeze and Excitation Network
• SeNet is the winning architecture of ImageNet 2017 in multiple categories
• Error rate on image classification: 2.251%
• Key Idea:
• In authors’ words: “Improve the representational power of the network by explicitly
modelling interdependencies between channels of its convolutional features”
• Simple explanation: Add parameters to each channel of a convolutional block so that network
can adaptively adjust the weighting of each feature map
SeNet Rationale
• Deep CNN’s learn increasing levels of abstractions from lower to higher layers. Lower
layers have higher resolution and can extract basic elements of information
• Higher layers can detect faces or generate text etc and deal with abstract information
• All of this works by fusing the spatial and channel information of an image.
• The network weights each of its channels equally when creating the output feature maps.
• SENets change this by adding a content aware mechanism to weight each channel
adaptively. In it’s most basic form this could mean adding a single parameter to each
channel and giving it a linear scalar how relevant each one is.
SENet Architecture
• Get a global understanding of each channel by squeezing the feature maps to a
single numeric value. This results in a vector of size n, where n is equal to the
number of convolutional channels.
• Afterwards, it is fed through a two-layer neural network, which outputs a vector
of the same size. These n values can now be used as weights on the original
features maps, scaling each channel based on its importance.
Code Illustration of the key idea
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
High level steps
Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7
Credits: Paul-Louis Prove
Adding squeeze excitation technique to ResNet

Convolutional Neural Networks : Popular Architectures

  • 1.
    CNN Popular Architectures andTransfer Learning Palacode Narayana Iyer Anantharaman 16th Oct 2018
  • 2.
    Motivation : Whystudy these? Understanding popular architectures help us in many ways: • We develop a better understanding of the subject by studying high performant architectures • This helps perform our own research on newer models • Learning their design philosophy help us to design our models more effectively. • Use them as a backbone for transfer learning, selecting the right architecture
  • 3.
  • 4.
    Current trend: DeeperModels work better • CNNs consistently outperform other approaches for the core tasks of CV • Deeper models work better • Increasing the number of parameters in layers of CNN without increasing their depth is not effective at increasing test set performance. • Shallow models overfit at around 20 million parameters while deep ones can benefit from having over 60 million. • Key insight: Model performs better when it is architected to reflect composition of simpler functions than a single complex function. This may also be explained off viewing the computation as a chain of dependencies
  • 6.
  • 7.
  • 8.
  • 9.
    Recent Results (Credits:CS231n Stanford)
  • 10.
  • 11.
    ResNet Approach • Generally:Deeper the better - See the fig • Issue: Hard to train deeper networks effectively : Validation error goes up not just due to overfitting but increase in training error • Solution: Use skip connections to propagate the activations to reduce the impact • Rationale: Imagine how to get an identity output through a deep network accurately.
  • 12.
    ResNet Hypothesis :Rationale • Deeper models should perform at least as well as shallower models • More parameters, more degrees of freedom to get the training error down • More depth, more ability to model abstractions • Solution by construction is copying the learned layers of a shallower model and setting the additional layers to identity mapping
  • 13.
    ResNet • Very deepnetwork: Uses 152 layers • Shallower versions (e.g. Resnet50) are available • The “go to” backbone network for many applications such as Faster RCNN • Pre trained weights are available for Keras, TensorFlow
  • 14.
    ResNet Details • Defaultinput size: (224, 224, 3) • Each stage reduces the width, height dimensions by a factor of 2 • This property is leveraged in later implementations such as pyramidal networks
  • 15.
  • 16.
    GoogleNet • Design choicesof filter sizes: (3, 3), (5, 5) and so on – Which to choose for each convolution layer? • Why not try all of them and choose the best? • Trying each possible value on every layer and experimenting manually is not a solution • Inception layer allows multiple filter sizes and learn their contributions through parameters automatically
  • 17.
  • 18.
    Inception Layer NaïveArchitecture (Fig Udacity Deep Learning)
  • 19.
  • 20.
  • 21.
    Example • Consider aninput volume (28, 28, 192) and an output volume (28, 28, 32) • How many computations are needed if we use a 5 x 5 filter? • Each filter will be 5 x 5 x 192, we will be moving this over a 28 x 28 surface and we have 32 of them • 28 x 28 x 32 x 5 x 5 x 192 = 120M • If we need multiple such filters, we need to add up corresponding computations for each of them • On a very deep network these many computations are prohibitively large even when we use powerful hardware • By reducing the dimensionality of the input before final convolutions, we get a manageable number of computations
  • 22.
    Example with bottlenecklayer • In our example, we can transform the 28 x 28 x 192 in to same sized surface but much reduced depth (say 16) using 1 x 1 convolutions • The bottleneck layer has the shape (28 x 28 x 16) • Perform the required convolutions (e.g. 5 x 5) on the bottleneck layer to generate the final output volume • Computations: #computations between input to bottleneck layer + #computations between bottleneck to output. In our example this is 12M
  • 23.
  • 24.
    State of theart : SENet • ImageNet 2017 topper in multiple categories • A novel technique to weight the contributions of the channels of a convolutional layer
  • 25.
    SeNet : Squeezeand Excitation Network • SeNet is the winning architecture of ImageNet 2017 in multiple categories • Error rate on image classification: 2.251% • Key Idea: • In authors’ words: “Improve the representational power of the network by explicitly modelling interdependencies between channels of its convolutional features” • Simple explanation: Add parameters to each channel of a convolutional block so that network can adaptively adjust the weighting of each feature map
  • 26.
    SeNet Rationale • DeepCNN’s learn increasing levels of abstractions from lower to higher layers. Lower layers have higher resolution and can extract basic elements of information • Higher layers can detect faces or generate text etc and deal with abstract information • All of this works by fusing the spatial and channel information of an image. • The network weights each of its channels equally when creating the output feature maps. • SENets change this by adding a content aware mechanism to weight each channel adaptively. In it’s most basic form this could mean adding a single parameter to each channel and giving it a linear scalar how relevant each one is.
  • 27.
    SENet Architecture • Geta global understanding of each channel by squeezing the feature maps to a single numeric value. This results in a vector of size n, where n is equal to the number of convolutional channels. • Afterwards, it is fed through a two-layer neural network, which outputs a vector of the same size. These n values can now be used as weights on the original features maps, scaling each channel based on its importance.
  • 28.
    Code Illustration ofthe key idea Ref: https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 29.
    High level steps Ref:https://towardsdatascience.com/squeeze-and-excitation-networks-9ef5e71eacd7 Credits: Paul-Louis Prove
  • 30.
    Adding squeeze excitationtechnique to ResNet