KEMBAR78
Hands-on Deep Learning in Python | PDF
Hands-on
Deep Learning in Python
Imry Kissos
Deep Learning Meetup
TLV August 2015
Outline
● Problem Definition
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
2
Problem Definition
3
Deep
Convolution
Network
1 http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
5
Predict Points
on Test Set
Train Model
General
Train Model
“Nose Tip”
Train Model
“Mouth Corners”
Flow
6
Train Images
Train Points
Fit Trained
Net
Flow
7
Test
Images
Predict Predicted
Points
Python Deep Learning Framework
nolearn - Wrapper to Lasagne
Lasagne - Theano extension for Deep Learning
Theano - Define, optimize, and mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
9
Training a Deep Neural Network
1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
2. Architecture Engineering
3. Optimization
4. Training the DNN
10
Data Exploration + Validation
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
1
Pre-Processing
12
Data
Normalization
Shuffle train data
Batch
-
- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant
Train / Validation Split
14
Classification - Train/Validation preserve classes proportion
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15
Architecture
16
X Y
Conv Pool Dense Output
Layers Definition
17
Activation Function
18
1
ReLU
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22
Back Propagation
Forward Path
23
Conv Dense
X Y
Output
Points
Back Propagation
Forward Path
24
X Y
Conv
Output
PointsDense
X Y
Training
Points
Back Propagation
Backward Path
25
X Y
Conv Dense
Back Propagation
Update
26
Conv Dense
For All Layers:
Objective
27
S.G.D
28
Updates the network after each batch
Karpathy - “Babysitting”: weights/updates ~1e3
Optimization - Updates
29
Alec Radford
Adjusting Learning Rate & Momentum
30
Linear in epoch
Convergence Tuning
31
stops according to validation loss
returns best weights
Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32
Fit
33
Loop over validation batchs
Forward
Loop over train batchs
Forward+BackProp
Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist
Learning Curves
Loop over 6 Nets:
35
Epochs
Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence
Jittering
EpochsEpochs
RMSE
RMSE
Part 1 Summary
Training a DNN:
37
Part 1 End
Break
Part 2
Beyond Training
Outline
● Problem Definition
● Motivation
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
40
Beyond Training
1. Improving the DNN
a. Analysis Capabilities
b. Augmentation
c. Forward - Backward Path
d. Monitor Layers’ Training
2. Open Source Packages
3. Summary
41
Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
42
Analysis Capabilities
1. Theoretical explanation
a. Eg. dropout and augmentation decrease overfit
2. Empirical claims about a phenomena
a. Eg. normalization improves convergence
3. Numerical understanding
a. Eg. exploding / vanishing updates
43
Reduce Overfitting
Solution:
Data Augmentation
44
Net 1
Net 2
Overfitting
Epochs
Data Augmentation
Horizontal Flip Perturbation
45
1
Advanced Augmentation
http://benanne.github.io/2015/03/17/plankton.html 46
Convergence Challenges
47
Need to monitor forward + backward path
EpochsEpochs
RMSE
Data ErrorNormalization
Forward - Backward Path
Forward
Backward:
Gradient w.r.t parameters
48
Monitor Layers’ Training
nolearn - visualize.py
49
Monitor Layers’ Training
50
X. Glorot ,Y. Bengio, Understanding the difficulty of training deep feedforward neural networks:
“Monitoring activation and gradients across layers and training
iterations is a powerful investigation tool”
Easy to monitor in Theano Framework
Weight Initialization matters (1)
51
Layer 1- Gradient are close to zero - vanishing gradients
Weight Initialization matters (2)
52
Network returns close to zero values for all inputs
Monitoring Activation
plateaus sometimes seen when training neural
networks
53
For most epochs the network returns close to zero output for all inputs
Objective plateaus sometimes can be explained by saturation
Max of Weights of Conv1:
Max of Updates of Conv1:
54http://cs231n.github.io/neural-networks-3/#baby
Monitoring weights/update ratio
3e-1
2e-1
1e-1
0
3e-3
2e-3
1e-3
0
Epoch
Epoch
Beyond Training
1. Improving the DNN
2. Open Source Packages
a. Hardware and OS
b. Python Framework
c. Deep Learning Open Source Packages
d. Effort Estimation
3. Summary
55
Hardware and OS
● Amazon Cloud GPU:
AWS Lasagne GPU Setup
Spot ~ $0.0031 per GPU Instance Hour
● IBM Cloud GPU:
http://www-03.ibm.com/systems/platformcomputing/products/symphony/gpuharvesting.html
● Your Linux machine GPU:
pip install -r https://raw.githubusercontent.com/dnouri/kfkd-
tutorial/master/requirements.txt
● Window install
http://deeplearning.net/software/theano/install_windows.html#install-windows
56
Starting Tips
● Sanity Checks:
○ DNN Architecture : “Overfit a tiny subset of data” Karpathy
○ Check Regularization ↗ Loss ↗
● Use pre-trained VGG as a base line
● Start with ~3 conv layer with ~16 filter each - quickly iterate
57
Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
58
Podcast : http://www.reversim.com/2015/10/277-scientific-python.html
Python Deep
Learning Framework
59Keras ,pylearn2, OpenDeep, Lasagne - common base
Tips from Deep Learning Packages
Torch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format
defining experiment’s configuration
60
Deep Learning
Open Source Packages
61
Caffe for applications
Torch and Theano for research on Deep Learning itself
http://fastml.com/torch-vs-theano/
Black BoxWhite Box
Open source progress rapidly→ impossible to predict industry’s standard
Disruptive Effort Estimation
Feature Eng Deep Learning
62Still requires algorithmic expertise
Summary
● Dove into Training a DNN
● Presented Analysis Capabilities
● Reviewed Open Source Packages
63
References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.iro.umontreal.ca/~bengioy/dlbook/
Montreal DL summer school
http://videolectures.net/deeplearning2015_montreal/
64
Questions?
65
Deep
Convolution
Regression
Network

Hands-on Deep Learning in Python