0% found this document useful (0 votes)

228 views74 pages

23 DeepLearning PDF

Deep learning provides state-of-the-art results in machine learning through neural networks that can learn representations of data. Key developments enabled the recent successes of deep learning, including increased computational power for training large neural networks, growth in data availability, and algorithmic improvements like dropout and unsupervised pre-training. Deep learning is now widely used by technology companies for applications like computer vision, speech recognition, and natural language processing.

Uploaded by

kavinscrib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

228 views74 pages

23 DeepLearning PDF

Uploaded by

kavinscrib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 74

Deep Learning

Why do you want to know about deep learning?

Motivation
• It works!
• State of the art in machine learning
• Google, Facebook, Twitter, Microsoft are all
using it.

• It is fun!
• Need to know what you are doing to do it
well.
Google Trends
Google Trends

https://www.google.com/trends/explore#q=deep%20learning%2C%20%2Fm%2F0hc2
f&cmpt=q&tz=Etc%2FGMT%2B5
Scene recognition

http://places.csail.mit.edu/demo.html
Google Brain - 2012

16 000 Cores
What it learned

http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-
computers-evidence-of-machine-learning.html?pagewanted=all
Google DeepMind

https://www.youtube.com/watch?v=V1eYniJ0Rnk
What is different?
• We have seen ML methods:
– SVM, decision trees, boosting, random forest

• We needed to hand design the input

• ML algorithm learns the decision boundary
Feature Design
Yes No classification

hand designed
features

input image
patch
Learned Feature Hierarchy
Yes No classification

high level
features

medium level
features

low level
features

input image
[Honglak Lee] patch
Scaling with Data Size

Deep Learning
Performance

Most ML algorithms

Amount of data

[Andrew Ng]
Deep Learning Techniques
• Artifical neural network
– Introduced in the 60s

• Convolutional neural network

– Introduced in the 80s

https://www.youtube.com/watch?v=EczYSl-ei9g
What Changed -Computational Power
What Changed – Data Size
I don’t Have a Cluster at Home

http://www.geforce.com/whats-new/articles/introducing-nvidia-geforce-gtx-titan-z
Deep Learning

What is deep learning?

Perceptron

x1 w1

w2
x2

x3 b

-1
Perceptron

x1 w1

w2
x2

x3 b

-1

𝑠 𝑏 + 𝑤𝑇𝑥
Separating Hyperplane
• x: data point
• y: label
• w: weight vector
w
• b: bias
b
Side Note: Step vs Sigmoid Activation

1
𝑠 𝑥 =
1 + 𝑒 −𝑐𝑥
The XOR Problem

x1
The XOR Problem

x3
x2

x1
Perceptron

𝑤𝑥 = 0

x1
Multi-Perceptron
𝑤11 𝑤12
x2 𝑊 = 𝑤21 𝑤22
𝑤′2 𝑥 = 0 𝑤31 𝑤32
𝑤′1 𝑥 = 0 𝑤′3 𝑥 = 0

𝑊𝑥 = ?

x1
Xor Problem
𝑊𝑎 = [+, −]
𝑊𝑏 = [+, +]
𝑊𝑐 = [−, −]
x2 𝑤′2 𝑥 = 0 𝑊𝑑 = [+, −]

𝑤′1 𝑥 = 0 a b

c d
x1
Xor Problem
𝑊𝑎 = [+, −]
h2 𝑊𝑏 = [+, +]
𝑊𝑐 = [−, −]
b 𝑊𝑑 = [+, −]

c a,d
Multi-Layer Perceptron

𝑊
𝑥

𝑠 𝑏 (1) + 𝑊 (1) 𝑥

http://deeplearning.net/tutorial/mlp.html
Multi-Layer Perceptron

𝑓 𝑥 = 𝐺(𝑏 (2) + 𝑊 (2) 𝑠 𝑏 (1) + 𝑊 (1) 𝑥 )

𝐺: logistic function, softmax for multiclass

http://deeplearning.net/tutorial/mlp.html
Yes No Classification

high level features

medium level features

low level features

Autoencoder
• This is what Google used for their Google
brain
• Basically just a MLP
• Output size is equal to input size

• Popular for pre-training a network on

unlabeled data
Autoencoder

Decoder

WT
Encoder

W
Deep Autoencoder
• Reconstruct image from
learned low dimensional
code
• Weights are tied
• Learned features are often
useful for classification
• Can add noise to input
image to prevent overfitting

Salakhutdinov & Hinton, NIPS 2007

From MLP to CNN
• So far no notion of neighborhood
• Invariant to permutation of input
• A lot of data is structured:
– Images
– Speech
–…

• Convolutional neural networks preserve

neighborhood
http://www.amolgmahurkar.com/classifySTLusingCNN.html
http://www.amolgmahurkar.com/classifySTLusingCNN.html
Convolution
Convolutional Network

http://parse.ele.tue.nl/cluster/2/CNNArchitecture.jpg
CNN Advantages
• neighborhood preserved
• translation invariant
• tied weights
DNNs are hard to train
• backpropagation – gradient descent
• many local minima
• prone to overfitting
• many parameters to tune
• SLOW
Stochastic Gradient Decent

https://www.youtube.com/watch?v=HvLJUsEc6dw
Development
• Computers got faster!
• Data got bigger.
• Initialization got better.
2006 Breakthrough
• Ability to train deep architectures by using
layer-wise unsupervised learning, whereas
previous purely supervised attempts had
failed
• Unsupervised feature learners:
– RBMs
– Auto-encoder variants
– Sparse coding variants
Unsupervised Pretraining

http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf
http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf
Dropout

X
𝑊
X X
𝑥 X X X
• Helps with overfitting
• Typically used with random initialization
• Training is slower than without dropout
Deep Learning for Sequences
• MLPs and CNNs have fixed input size
• How would you handle sequences?

• Example: Complete a sentence

–…
– are …
– How are …
Slide from G. Hinton
Meaning of Life

https://www.youtube.com/watch?v=vShMxxqtDDs
Slide from G. Hinton
Recurrent Neural Network

http://blog.josephwilk.net/ruby/recurrent-
neural-networks-in-ruby.html
Recurrent Neural Network

http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Intriguing properties of neural
networks

Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan,Ian Goodfellow,
Rob Fergus
International Conference on Learning Representations (2014)
http://www.datascienceassn.org/sites/default/files/Intriguing%20Properties%20of%20Neural%
20Networks_0.pdf
Libraries
• Theano
• Torch
• Caffe

• TensorFlow
• …
Theano
• Full disclosure: My favorite
• Python
• Transparent GPU integration
• Symbolical Graphs
• Auto-gradient
• Low level – in a good way!
• If you want high-level on top:
– Pylearn2
– Keras, Lasagne, Blocks
–…
Torch
• Lua (and no Python interface)
• Very fast convolutions
• Used by Google Deep Mind, Facebook AI, IBM
• Layer instead of graph based

https://en.wikipedia.org/wiki/Torch_(machine_learning)
Caffe
• C++ based
• Higher abstraction than Theano or Torch
• Good for training standard models
• Model zoo for pre-trained models
Tensorflow
• Symbolic graph and auto-gradient
• Python interface
• Visualization tools
• Some performance issues regarding speed and
memory

https://github.com/soumith/convnet-benchmarks/issues/66
Tips and Tricks
Number of Layers / Size of Layers

• If data is unlimited larger and deeper should

be better

• Larger networks can overfit more easily

• Take computational cost into account

Learning Rate
• One of the most important parameters
• If network diverges most probably learning
rate is too large
• Smaller works better
• Can slowly decay over time
• Can have one learning rate per layer

Other tips for SGD:

http://leon.bottou.org/publications/pdf/tricks-2012.pdf
Momentum
• Helps to escape local minima
• Crucial to achieve high performance

More about Momentum:

http://www.jmlr.org/proceedings/papers/v28/sutskever13.pdf
Convergence
• Monitor validation error
• Stop when it doesn’t improve within n
iterations

• If learning rate decays you might want to

adjust number of iterations
Initialization of W
• Need randomization to break symmetry

• Bad initializations are untrainable

• Most heuristics depend on the number of input

(and output) units

• Sometimes W is rescaled during training

– Weight decay (L2 regularization)
– Normalization
Data Augmentation
• Exploit invariances of the data
• Rotation, translation
• Nonlinear transformation
• Adding Noise

http://en.wikipedia.org/wiki/MNIST_database
Data Normalization
• We have seen std and mean normalization

• Whitening
– Neighbored pixels often are redundant
– Remove correlation between features

More about preprocessing:

http://deeplearning.stanford.edu/wiki/index.php/Data_Preproc
essing
Non-Linear Activation Function
• Sigmoid
– Traditional choice

• Tanh
– Symmetric around the origin
– Better gradient propagation than Sigmoid

• Rectified Linear
– max(x,0)
– State of the art
– Good gradient propagation
– Can “die”
L1 and L2 Regularization
• Most pictures of nice filters involve some
regularization

• L2 regularization corresponds to weight decay

• L2 and early stopping have similar effects

• L1 leads to sparsity

• Might not be needed anymore (more data,

dropout)
Monitoring Training
• Monitor training and validation performance
• Can monitor hidden units
• Good: Uncorrelated and high variance
Further Resources
• More about theory:
– Yoshua Bengio’s
book:http://www.iro.umontreal.ca/~bengioy/dlbook/
– Deep learning reading list:
http://deeplearning.net/reading-list/

• More about Theano:

– http://deeplearning.net/software/theano/
– http://deeplearning.net/tutorial/

Gluon Tutorials: Deep Learning - The Straight Dope
No ratings yet
Gluon Tutorials: Deep Learning - The Straight Dope
403 pages
Lecture8 PDF
No ratings yet
Lecture8 PDF
434 pages
AI & ML: Concepts and Comparisons
No ratings yet
AI & ML: Concepts and Comparisons
179 pages
Theory Lectures v2.3
No ratings yet
Theory Lectures v2.3
264 pages
2018 Miccai PDF
No ratings yet
2018 Miccai PDF
239 pages
Machine Learning Part 02
No ratings yet
Machine Learning Part 02
161 pages
Quantecon Python Programming
No ratings yet
Quantecon Python Programming
388 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Machine Learning Algorithms Theory - Vimal Mishra
No ratings yet
Machine Learning Algorithms Theory - Vimal Mishra
931 pages
Understanding Vector Embeddings
No ratings yet
Understanding Vector Embeddings
14 pages
Machine Learning Part 03
No ratings yet
Machine Learning Part 03
81 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
145 pages
UasF8LbRcGSLtQZG83HN - Natural Language Processing in Python
No ratings yet
UasF8LbRcGSLtQZG83HN - Natural Language Processing in Python
219 pages
React Js Cheat Sheet
No ratings yet
React Js Cheat Sheet
280 pages
Lang Chain
No ratings yet
Lang Chain
143 pages
771 A18 Lec4
100% (1)
771 A18 Lec4
128 pages
Deep Learning Methods
No ratings yet
Deep Learning Methods
336 pages
Experiment No. 4 TE SL-II (ANN)
100% (1)
Experiment No. 4 TE SL-II (ANN)
2 pages
RAG - Genai
No ratings yet
RAG - Genai
11 pages
ML-5TH Unit
No ratings yet
ML-5TH Unit
28 pages
OS by JJsir
100% (1)
OS by JJsir
269 pages
Lecture+Notes Intro To MLOps Session3
No ratings yet
Lecture+Notes Intro To MLOps Session3
8 pages
Binder
No ratings yet
Binder
97 pages
Transformers From Scratch
No ratings yet
Transformers From Scratch
39 pages
DL Lab Manual
No ratings yet
DL Lab Manual
65 pages
Clean Code Principles: Python Guide
No ratings yet
Clean Code Principles: Python Guide
180 pages
Deep Learning Tutorial Complete (v3)
No ratings yet
Deep Learning Tutorial Complete (v3)
109 pages
Btech CSE
100% (1)
Btech CSE
17 pages
Autoencoder Report 1
No ratings yet
Autoencoder Report 1
34 pages
6 Types of Neural Network
No ratings yet
6 Types of Neural Network
8 pages
Mobile Communication Course Guide
No ratings yet
Mobile Communication Course Guide
1 page
5-Day Gen AI Intensive Course 2024 November 11-15 (Full)
No ratings yet
5-Day Gen AI Intensive Course 2024 November 11-15 (Full)
347 pages
CNNs for ECE Students
No ratings yet
CNNs for ECE Students
60 pages
Lect3 UWA PDF
No ratings yet
Lect3 UWA PDF
73 pages
Phyton
No ratings yet
Phyton
118 pages
Learning Opencv 3 Computer Vision With Python Up
No ratings yet
Learning Opencv 3 Computer Vision With Python Up
49 pages
Intelligent Agents: Fundamentals of Artificial Intelligence
No ratings yet
Intelligent Agents: Fundamentals of Artificial Intelligence
51 pages
Module2.3 Hyperparameter Optimization
No ratings yet
Module2.3 Hyperparameter Optimization
29 pages
Chapter 6 ML Classifications
100% (1)
Chapter 6 ML Classifications
51 pages
Stability: EE-601 Linear System Theory
No ratings yet
Stability: EE-601 Linear System Theory
26 pages
UNIT-I - Introduction To Computer Vision
No ratings yet
UNIT-I - Introduction To Computer Vision
45 pages
ONDM2019 Tutorial 5G Networks Technologies Challenges and Tools
No ratings yet
ONDM2019 Tutorial 5G Networks Technologies Challenges and Tools
39 pages
Docs Srsran Com Project en Latest
No ratings yet
Docs Srsran Com Project en Latest
239 pages
Machine Learning Methods For Data Security
No ratings yet
Machine Learning Methods For Data Security
141 pages
Gradient Descent for Deep Learning
No ratings yet
Gradient Descent for Deep Learning
21 pages
Lesson 4 Deep Neural Network and Tools
100% (1)
Lesson 4 Deep Neural Network and Tools
159 pages
Balanced Truncation
No ratings yet
Balanced Truncation
15 pages
Lecture 1: Introduction To Reinforcement Learning: David Silver
No ratings yet
Lecture 1: Introduction To Reinforcement Learning: David Silver
46 pages
2019 - On The Control of Multi-Agent Systems - A Survey
No ratings yet
2019 - On The Control of Multi-Agent Systems - A Survey
164 pages
ML Lectures Summary 2
No ratings yet
ML Lectures Summary 2
52 pages
Regularization: Swetha V, Research Scholar
No ratings yet
Regularization: Swetha V, Research Scholar
32 pages
Adaline/Madaline:Applications
100% (1)
Adaline/Madaline:Applications
25 pages
Unit4 DL Final
No ratings yet
Unit4 DL Final
30 pages
LLM Fince-Tuning
No ratings yet
LLM Fince-Tuning
16 pages
RLHF - Reinforcement Learning From Human Feedback
No ratings yet
RLHF - Reinforcement Learning From Human Feedback
21 pages
9.deep Feedforward Networks
100% (1)
9.deep Feedforward Networks
13 pages
Linear Algebra LectureNote
No ratings yet
Linear Algebra LectureNote
288 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
40 pages
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
No ratings yet
Deep Learning and Applications: Pham The Bao Ptbao@sgu - Edu.vn
43 pages
Job Title Here: P File
No ratings yet
Job Title Here: P File
1 page
Job Title Here: Profile
No ratings yet
Job Title Here: Profile
1 page
Test Resume
No ratings yet
Test Resume
1 page
Uber Drives
No ratings yet
Uber Drives
1 page
Oops
86% (7)
Oops
71 pages
19 Clustering
No ratings yet
19 Clustering
55 pages
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
No ratings yet
Data Science - CS109: Joe Blitzstein, Verena Kaynig-Fittkau, Hanspeter Pfister
47 pages
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
No ratings yet
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
67 pages
20 Presentations PDF
No ratings yet
20 Presentations PDF
21 pages
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
No ratings yet
Experimental Design and Its Role in Data Science: Tirthankar Dasgupta CS 109 / Stat 121 November 17, 2015
67 pages
712 713 CH 1 Review Sheet 2a
0% (1)
712 713 CH 1 Review Sheet 2a
2 pages
Django Notes
No ratings yet
Django Notes
53 pages
HOTS-PLP PD Program Design
No ratings yet
HOTS-PLP PD Program Design
25 pages
Article Structure Importance
No ratings yet
Article Structure Importance
3 pages
CTL - Report
No ratings yet
CTL - Report
54 pages
ISB Consulting Club Overview & Initiatives
No ratings yet
ISB Consulting Club Overview & Initiatives
1 page
Developmental Reading (Drill)
No ratings yet
Developmental Reading (Drill)
3 pages
Machine Learning in Production
No ratings yet
Machine Learning in Production
31 pages
Science Month Events at Pasian HS
No ratings yet
Science Month Events at Pasian HS
4 pages
Teaching Philosophy
No ratings yet
Teaching Philosophy
2 pages
Pe Summative Assignment
No ratings yet
Pe Summative Assignment
17 pages
Software Engineering Module ECM2415
No ratings yet
Software Engineering Module ECM2415
2 pages
Review Journal 6
No ratings yet
Review Journal 6
5 pages
Lwte Ts 1a Ex1
No ratings yet
Lwte Ts 1a Ex1
6 pages
Assessment in Secondary Social Studies: Michael P. Vale
No ratings yet
Assessment in Secondary Social Studies: Michael P. Vale
23 pages
Reflection Volleyball Lesson 4 Block and Spike
No ratings yet
Reflection Volleyball Lesson 4 Block and Spike
3 pages
QTR 3 Basic Movements
No ratings yet
QTR 3 Basic Movements
8 pages
Let's Study: The Critical Attributes of 21st Century Education
No ratings yet
Let's Study: The Critical Attributes of 21st Century Education
3 pages
Why Do We Use Rhyme, Songs and Chants in The Primary Classroom?
No ratings yet
Why Do We Use Rhyme, Songs and Chants in The Primary Classroom?
4 pages
Unit 7 - Speaking
No ratings yet
Unit 7 - Speaking
19 pages
Psycho Linguistic Screening Test
No ratings yet
Psycho Linguistic Screening Test
28 pages
Y8 Task 1 Vlog - 2024-1
No ratings yet
Y8 Task 1 Vlog - 2024-1
3 pages
53040564655
No ratings yet
53040564655
3 pages
Ensemble Learning Quiz
No ratings yet
Ensemble Learning Quiz
34 pages
Tools For Professional Development
No ratings yet
Tools For Professional Development
42 pages
11 Animation Q1 W3 M3 Final
No ratings yet
11 Animation Q1 W3 M3 Final
22 pages
Edd317 Reflection Paper
No ratings yet
Edd317 Reflection Paper
5 pages
Year 4 Inquiry: Organizing Systems
No ratings yet
Year 4 Inquiry: Organizing Systems
5 pages
Qsat Q1 2025
No ratings yet
Qsat Q1 2025
13 pages
Art Integration in Language Teaching
No ratings yet
Art Integration in Language Teaching
4 pages
My Resume
No ratings yet
My Resume
2 pages
Icebergs and Worldviews
No ratings yet
Icebergs and Worldviews
14 pages

23 DeepLearning PDF

Uploaded by

23 DeepLearning PDF

Uploaded by

Deep Learning

Why do you want to know about deep learning?

• We needed to hand design the input

• Convolutional neural network

• Recurrent neural network

What is deep learning?

𝑓 𝑥 = 𝐺(𝑏 (2) + 𝑊 (2) 𝑠 𝑏 (1) + 𝑊 (1) 𝑥 )

𝐺: logistic function, softmax for multiclass

high level features

medium level features

low level features

• Popular for pre-training a network on

Salakhutdinov & Hinton, NIPS 2007

• Convolutional neural networks preserve

• Example: Complete a sentence

• If data is unlimited larger and deeper should

• Larger networks can overfit more easily

• Take computational cost into account

Other tips for SGD:

More about Momentum:

• If learning rate decays you might want to

• Bad initializations are untrainable

• Most heuristics depend on the number of input

• Sometimes W is rescaled during training

More about preprocessing:

• L2 regularization corresponds to weight decay

• Might not be needed anymore (more data,

• More about Theano:

You might also like