KEMBAR78
Intro4 ANN Deep CNN PDF | PDF | Futurology | Mathematical Concepts
0% found this document useful (0 votes)
120 views20 pages

Intro4 ANN Deep CNN PDF

Deep learning is a family of techniques for learning compositional vector representations of complex data using neural networks. Deep neural networks learn hierarchical representations of data by building higher-level features from lower-level ones. Convolutional neural networks apply this idea to visual data by incorporating spatial structure through local connectivity and parameter sharing. Modern deep learning techniques like residual networks enable very deep networks to be trained effectively.

Uploaded by

pranab sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views20 pages

Intro4 ANN Deep CNN PDF

Deep learning is a family of techniques for learning compositional vector representations of complex data using neural networks. Deep neural networks learn hierarchical representations of data by building higher-level features from lower-level ones. Convolutional neural networks apply this idea to visual data by incorporating spatial structure through local connectivity and parameter sharing. Modern deep learning techniques like residual networks enable very deep networks to be trained effectively.

Uploaded by

pranab sarker
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

What is deep learning?

A family of techniques for learning compositional vector representations


of complex data.

CS221 / Spring 2020 / Finn & Anari 9


Review: linear predictors
w
x1

x2 f✓ (x)

x3

Output:

f✓ (x) = w · x

Parameters: ✓ = w

CS221 / Spring 2020 / Finn & Anari 11


Review: neural networks
V h1
x1 w
x2 f✓ (x)

x3
h2
Intermediate hidden units:
z 1
hj (x) = (vj · x) (z) = (1 + e )
Output:
f✓ (x) = w · h(x)
Parameters: ✓ = (V, w)

CS221 / Spring 2020 / Finn & Anari 12


Deep neural networks
1-layer neural network: x
w>
score =

2-layer neural network: x


> V
w
score = ( )

3-layer neural network: x


U V
>
w
score = ( ( ))

CS221 / Spring 2020 / Finn & Anari


... 13
Depth
x
h h0 h00 h000
f✓ (x)

Intuitions:
• Hierarchical feature representations
• Can simulate a bounded computation logic circuit (original moti-
vation from McCulloch/Pitts, 1943)
• Learn this computation (and potentially more because networks
are real-valued)
• Formal theory/understanding is still incomplete
• Some hypotheses emerging: double descent, lottery ticket hypoth-
esis

CS221 / Spring 2020 / Finn & Anari 14


[figure from Honglak Lee]

What’s learned?

CS221 / Spring 2020 / Finn & Anari 15


Review: optimization
Regression:
Loss(x, y, ✓) = (f✓ (x) y)2
Key idea: minimize training loss
1 X
TrainLoss(✓) = Loss(x, y, ✓)
|Dtrain |
(x,y)2Dtrain

min TrainLoss(✓)
✓2Rd

Algorithm: stochastic gradient descent

For t = 1, . . . , T :
For (x, y) 2 Dtrain :
✓ ✓ ⌘t r✓ Loss(x, y, ✓)

CS221 / Spring 2020 / Finn & Anari 16


Training

• Non-convex optimization

• No theoretical guarantees that it works

• Before 2000s, empirically very difficult to get working

CS221 / Spring 2020 / Finn & Anari 17


What’s di↵erent today
Computation (time/memory) Information (data)

CS221 / Spring 2020 / Finn & Anari 18


How to make it work

• More hidden units (over-parameterization)


• Adaptive step sizes (AdaGrad, Adam)
• Dropout to guard against overfitting
• Careful initialization (pre-training)
• Batch normalization

Model and optimization are tightly coupled


CS221 / Spring 2020 / Finn & Anari 19
Summary
• Deep networks learn hierarchical representations of data

• Train via SGD, use backpropagation to compute gradients

• Non-convex optimization, but works empirically given enough com-


pute and data

CS221 / Spring 2020 / Finn & Anari 20


Motivation
x
W

• Observation: images are not arbitrary vectors


• Goal: leverage spatial structure of images (translation equivari-
ance)

CS221 / Spring 2020 / Finn & Anari 22


Idea: Convolutions

CS221 / Spring 2020 / Finn & Anari 23


[figure from Andrej Karpathy]

Prior knowledge

• Local connectivity: each hidden unit operates on a local image


patch (3 instead of 7 connections per hidden unit)

• Parameter sharing: processing of each image patch is same (3


parameters instead of 3 · 5)

• Intuition: try to match a pattern in image

CS221 / Spring 2020 / Finn & Anari 24


Convolutional layers

• Instead of vector to vector, we do volume to volume


[Andrej Karpathy’s demo]

CS221 / Spring 2020 / Finn & Anari 25


[figure from Andrej Karpathy]

Max-pooling

• Intuition: test if there exists a pattern in neighborhood

• Reduce computation, prevent overfitting

CS221 / Spring 2020 / Finn & Anari 26


Example of function evaluation

[Andrej Karpathy’s demo]

CS221 / Spring 2020 / Finn & Anari 27


[Krizhevsky et al., 2012]

AlexNet

• Non-linearity: use RelU (max(z, 0)) instead of logistic


• Data augmentation: translate, horizontal reflection, vary intensity,
dropout (guard against overfitting)
• Computation: parallelize across two GPUs (6 days)
• Results on ImageNet: 16.4% error (next best was 25.8%)

CS221 / Spring 2020 / Finn & Anari 28


[He et al. 2015]

Residual networks
x 7! (W x) + x

• Key idea: make it easy to learn the iden-


tity (good inductive bias)
• Enables training 152 layer networks
• Results on ImageNet: 3.6% error

CS221 / Spring 2020 / Finn & Anari 29


Summary
• Key idea 1: locality of connections, capture spatial structure

• Key idea 2: Filters share parameters, capture translational equiv-


ariance

• Depth matters

• Applications to images, text, Go, drug design, etc.

CS221 / Spring 2020 / Finn & Anari 30

You might also like