An Introduction to Neural
Networks
Instituto Tecgraf PUC-Rio
Nome: Fernanda Duarte
Orientador: Marcelo Gattass
What is Machine Learning?
A machine learning algorithm is an algorithm that is able to learn from data.
A computer program is said to learn from experience E with respect to some class of tasks T
and performance measure P, if its performance at tasks in T, as measured by P, improves
with experience E. (Mitchell, 1997)
Applications
Digit Recognition Face Recognition Recommendation Engines
Virtual assistants (Cortana, Siri etc) Self-driving vehicles
Surveillance systems
Tasks...
Formal tasks: Playing board games, solving puzzles, mathematical and and logic
problems → Easier to code!
Expert tasks: Medical diagnosis, engineering, scheduling.
Mundane tasks: Everyday speech, written language, perception, walking, object
recognition and manipulation.
Artificial Neural Networks
Neuron: biological inspiration for computation
Neuron Artificial neuron/unit
Perceptron (Frank Rosenblatt, 1957)
- Algorithm for learning a binary classifier.
- Only capable of learning linearly separable patterns.
or
Feedforward Neural Networks
(or Deep Feedforward Networks or Multilayer
Perceptrons (MLP)) (see “Deep Learning” book, Ian Goodfellow et al.)
- Multilayer structure → “sophisticated decision making” (a unit in the second
layer can make a decision at a more complex and more abstract level than unit
in the first layer) → Learn features directly from the data.
- Nonlinearity (extends the kinds of functions that we can represent with our
neural network, e.g. XOR function (“exclusive or” ))
Feedforward Neural Networks
Goal: Approximate some function !*
Example → A classifier " = !* (#)
input #, category (label) "
A feedforward network defines a mapping " = ! (#; $) and
learn the values of the parameter $ that result in the best
function approximation.
Feedforward: information flows in one direction (input layer → output layer)
Why “network”? - Math Intuition:
- Typically represented by composing together many different functions.
Example: functions ! (1), ! (2), ! (3) connected in a chain, to form
! (") = ! (3)( ! (2)( ! (1)(")))
In this case, ! (1) is called the first hidden layer of the network, ! (2) is
called the second hidden layer, and the final layer ! (3) is called the output
layer.
- Why hidden layer? Behavior not directly specified → learning algorithm must
decide how to use those layers to form ! (") that best approximates !*.
- Length of the chain gives the depth of the model → deep learning (you can
“stack” multiple layers)
Graph representation of the network
- The feedforward network model is associated with a directed acyclic graph
(DAG) describing how the functions are composed together.
Artificial neuron
Fully-connected layers
- Neurons between two adjacent layers are fully pairwise connected, but neurons
within a single layer share no connections.
Feedforward computation
- The abstraction of a layer has the nice property that it allows us to use efficient
vectorized code (e.g. matrix multiplies).
- Think of each hidden layer as a vector, where each value represents a
neuron/unit.
- Repeated matrix multiplications interwoven with activation function.
Nonlinear!
Example of activation function: Sigmoid
Feedforward computation
Example of activation function: Sigmoid
Feedforward computation
Example of activation function: Sigmoid
Feedforward computation
Obs.: The output layer neurons most commonly have
a different activation function → e.g. softmax for
class scores (classification), linear functions for real-
valued target (regression), etc.
Example of activation function: Sigmoid
Feedforward computation
are the learnable parameters
of the network!
What about learning?
Optimization: find the parameters that minimizes the cost function (or loss function).
A loss function C is a measure of how wrong the model is in terms of its ability to
estimate the relationship between ! and ", i.e., " = #* (!), with the chosen
parameters. (e.g. Mean Squared Error (MSE))
Gradient Descent is a very common optimization algorithm.
Obs.: Training → Training + Validation sets
Inference → Test set
Gradient Descent
The gradient of a function gives the direction of steepest ascent
layer l-1 layer l
i j
The direction of steepest descent is the negative gradient.
Parameters update
during learning
process.
! is the learning rate
(hyperparameter)
Gradient Descent
Backpropagation
How to compute the gradient of the cost function with respect to the weights w and biases b
(efficiently)?
The backprop algorithm gives us detailed insights into how changing the weights and biases
changes the overall behaviour of the network.
Propagate the error backwards.
Main idea: use chain rule to compute the gradients.
Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Example:
Backpropagation
(Cost function of learning
Main idea: use chain rule to compute the gradients. example i)
Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Example:
Backpropagation
Main idea: use chain rule to compute the gradients.
Backpropagation
See: http://neuralnetworksanddeeplearning.com/chap2.html#the_backpropagation_algorithm
Learning process (summary)
For each learning example i in training set:
1 - Feedforward computation;
2 - Backpropagation;
3 - Weight update.
- Learning rate (!)
- Regularization (prevent overfitting)
- Epoch
- Activation function alternatives (ReLU, tanh, etc)
- Stochastic Gradient Descent (SGD) → Minibatch
Convolutional Neural Networks (CNN)
(AlexNet)
Convolutional Neural Networks (CNN or ConvNets)
- Useful when the proximity between two data points indicates how related they are (e.g.: pixels in
images!) → CNNs preserves spatial structure.
- The neurons in a layer will only be connected to a small region of the layer before it, instead of all
of the neurons in a fully-connected manner → less parameters!
- Convolutional Neural Networks take advantage of the fact that the input consists of images and
they constrain the architecture in a more sensible way. In particular, unlike a regular Neural
Network, the layers of a ConvNet have neurons arranged in 3 dimensions: width, height, depth.
Convolutional Neural Networks (CNN or ConvNets)
Every layer of a ConvNet transforms one volume of activations to another through a differentiable
function.
We use three main types of layers to build a basic CNN architecture: Convolutional Layer, Pooling Layer,
and Fully-Connected Layer.
Convolution Pooling Convolution Pooling Fully Fully Output predictions
Connected Connected
Convolutional Layer
The CONV layer’s parameters consist of a set of learnable filters → Every filter is small spatially (along
width and height), but extends through the full depth of the input volume (e.g. 5x5x3 for images with 3
color channels).
depth column
Convolutional Layer
Forward pass: We slide (or convolve) each filter across the input volume and compute dot products
between the entries of the filter and the input element by element, followed by an nonlinear activation
function elementwise.
Every filter produce a 2-dimensional activation map. (e.g. if we use 12 filters of dimensions 5x5x3, the
conv layer may have an output volume of 32x32x12, i.e., 12 activation maps with dimensions 32x32)
Intuitively, the network will learn filters that activate when they see some type of visual feature, such as
an edge of some orientation, for example.
zero-padding
Pooling Layer
Its function is to progressively reduce the spatial size of the representation to reduce the amount of
parameters and computation in the network, and hence to also control overfitting.
Fully-Connected Layer
Same as before.
Used to output the final classification scores.
https://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
CNN for Real-Time Object Detection using
YOLO (“You Only Look Once)
Trabalho
Construir uma rede neural feedforward com (no mínimo) duas camadas escondidas (fully-
connected) para realizar reconhecimento automático de dígitos escritos à mão, utilizando a base
de dados MNIST.
http://yann.lecun.com/exdb/mnist/
(A descrição do trabalho será enviada por e-mail)
References
https://www.kdnuggets.com/2018/02/8-neural-network-architectures-machine-learning-researchers-need-learn.html
https://www.youtube.com/watch?v=d14TUNcbn1k
http://neuralnetworksanddeeplearning.com/chap1.html
http://www.deeplearningbook.org/
https://www.youtube.com/watch?v=1L0TKZQcUtA
https://towardsdatascience.com/machine-learning-fundamentals-via-linear-regression-41a5d11f5220
References
Backpropagation:
https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/
https://ayearofai.com/rohan-lenny-1-neural-networks-the-backpropagation-algorithm-explained-abf4609d4f9d
http://neuralnetworksanddeeplearning.com/chap2.html
https://www.youtube.com/watch?v=tIeHLnjs5U8
References
CNNs:
https://hashrocket.com/blog/posts/a-friendly-introduction-to-convolutional-neural-networks
https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/
http://web.stanford.edu/class/cs231a/lectures/intro_cnn.pdf
http://cs231n.stanford.edu/