0% found this document useful (0 votes)

57 views100 pages

Intro To Neural Networks

The document introduces neural networks and the perceptron. It covers the basic concepts of the perceptron, how to connect perceptrons together into a neural network, and how to train perceptrons using gradient descent and backpropagation. It provides examples to illustrate neural network concepts and terminology.

Uploaded by

sahib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views100 pages

Intro To Neural Networks

Uploaded by

sahib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 100

Introduction to neural networks

Logistics

• Check Canvas (announcements)

• Lecture slides in Canvas Files.

• Lab 1 is out. Check early if worried on pre-requisites.

• Office hours info TBA.

Overview of today’s lecture
• Perceptron.

• Neural networks.

• Training perceptrons.

• Gradient descent.

• Backpropagation.

• Stochastic gradient descent.

Perceptron
1950s Age of the Perceptron
1957 The Perceptron (Rosenblatt)
1969 Perceptrons (Minsky, Papert)

1980s Age of the Neural Network

1986 Back propagation (Hinton)

1990s Age of the Graphical Model

2000s Age of the Support Vector Machine

2010s Age of the Deep Network

deep learning = known algorithms + computing power + big data

The Perceptron

weights

sum sign function

(Heaviside step function)

inputs output

y=
The Perceptron

weights

sum sign function

(Heaviside step function)

inputs output

b y=

1
The Perceptron

weights

sum sign function

(Heaviside step function)

inputs output

suppress the bias term

(less clutter)
Aside: Inspiration from Biology

Neural nets/perceptrons are loosely inspired by

biology.
But they certainly are not a model of how the brain
works, or even how neurons work.
The Perceptron

weights

sum sign function

(e.g., step,sigmoid, Tanh, ReLU)

inputs output

bias
Another way to draw it…

weights
(1) Combine the sum and
activation function

inputs output

Activation Function
(e.g., Sigmoid function of weighted sum)

(2) suppress the bias term

(less clutter)
Programming the 'forward pass'
Activation function (sigmoid, logistic function)
float f(float a)
{
return 1.0 / (1.0+ exp(-a));
}

output

Perceptron function (logistic regression)

float perceptron(vector<float> x, vector<float> w)
{
float a = dot(x,w);
return f(a);
}
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

How many perceptrons in this neural network?

Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘one perceptron’
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘two perceptrons’
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘three perceptrons’
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘four perceptrons’
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘five perceptrons’
Connect a bunch of perceptrons together …

Neural Network
a collection of connected perceptrons

‘six perceptrons’
Some terminology…

‘input’ layer

…also called a Multi-layer Perceptron (MLP)

Some terminology…

‘hidden’ layer
‘input’ layer

…also called a Multi-layer Perceptron (MLP)

Some terminology…

‘hidden’ layer
‘input’ layer
‘output’ layer

…also called a Multi-layer Perceptron (MLP)

this layer is a
‘fully connected layer’

all pairwise neurons between layers are connected

so is this

all pairwise neurons between layers are connected

How many neurons (perceptrons)?

How many weights (edges)?

How many learnable parameters total?

How many neurons (perceptrons)? 4+2=6

How many weights (edges)?

How many learnable parameters total?

How many neurons (perceptrons)? 4+2=6

How many weights (edges)? (3 x 4) + (4 x 2) = 20

How many learnable parameters total?

How many neurons (perceptrons)? 4+2=6

How many weights (edges)? (3 x 4) + (4 x 2) = 20

How many learnable parameters total? 20 + 4 + 2 = 26

bias terms
performance usually tops out at 2-3 layers,
deeper networks don’t really improve performance...

...with the exception of convolutional networks for images

Training perceptrons
world’s smallest perceptron!

(a.k.a. line equation, linear regression)

Learning a Perceptron

Given a set of samples and a Perceptron

Estimate the parameters of the Perceptron

Given training data:

10 10.1
2 1.9
3.5 3.4
1 1.1

What do you think the weight parameter is?

Given training data:

10 10.1
2 1.9
3.5 3.4
1 1.1

What do you think the weight parameter is?

not so obvious as the network gets more complicated so we use …

An Incremental Learning Strategy
(gradient descent)

Given several examples

and a perceptron

Modify weight such that gets ‘closer’ to

perceptron perceptron true

parameter output label
An Incremental Learning Strategy
(gradient descent)

Given several examples

and a perceptron

Modify weight such that gets ‘closer’ to

perceptron perceptron what does true

parameter output this mean? label
Before diving into gradient descent, we need to understand …

Loss Function
defines what is means to be
close to the true solution

YOU get to chose the loss function!

(some are better than others depending on what you want to do)
L1 Loss L2 Loss

Zero-One Loss Hinge Loss

Learning Strategy
(gradient descent)

Given several examples

and a perceptron

Modify weight such that gets ‘closer’ to

perceptron perceptron true

parameter output label
Gradient descent
Two ways to think about them:

Slope of a function Knobs on a machine

1. Slope of a function:

describes the slope around a

point
2. Knobs on a machine:

input output

describes how each

‘knob’ affects the output

small change in parameter output will change by

Gradient descent:

Given a
fixed-point on a function,
move in the direction
opposite of the gradient
Gradient descent:

update rule:
Backpropagation
Training the world’s smallest perceptron
This is just gradient
descent, that means…

this should be the gradient of

the loss function

=
Now where does this come from?
Compute the derivative

just shorthand

That means the weight update for gradient descent is:

move in direction of negative gradient

Gradient Descent (world’s smallest perceptron)

For each sample

1. Predict

a. Forward pass

b. Compute Loss

2. Update

a. Back Propagation

b. Gradient update
world’s (second) smallest
perceptron!

y = w1. x1 + w2 x2
.

function of two parameters!

Gradient Descent

For each sample

1. Predict

a. Forward pass
we just need to compute partial
b. Compute Loss derivatives for this network

2. Update

a. Back Propagation

b. Gradient update
Derivative computation

^ ^

y^ = w1. x1 + w2 x2
.
Derivative computation

^ ^

Gradient Update
Gradient Descent

For each sample

1. Predict

a. Forward pass
(side computation to track loss. not
b. Compute Loss needed for backprop)

two lines now

2. Update

a. Back Propagation

b. Gradient update
(adjustable step size)
We haven’t seen a lot of ‘propagation’ yet
because our perceptrons only had one layer…
multi-layer perceptron