Hands-on Deep Learning in Python

Hands-on
Deep Learning in Python
Imry Kissos
Deep Learning Meetup
TLV August 2015

Outline
● Problem Definition
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
2

Problem Definition
3
Deep
Convolution
Network
1 http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/

Tutorial
● Goal: Detect facial
landmarks on (normal)
face images
● Data set provided by
Dr. Yoshua Bengio
● Tutorial code available:
https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4

Flow
5
Predict Points
on Test Set
Train Model
General
Train Model
“Nose Tip”
Train Model
“Mouth Corners”

Flow
6
Train Images
Train Points
Fit Trained
Net

Flow
7
Test
Images
Predict Predicted
Points

Python Deep Learning Framework
nolearn - Wrapper to Lasagne
Lasagne - Theano extension for Deep Learning
Theano - Define, optimize, and mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPU
OS: Linux, OS X, Windows

Training a Deep Neural Network
1. Data Analysis
2. Architecture Engineering
3. Optimization
4. Training the DNN
9

1. Data Analysis
a. Exploration + Validation
b. Pre-Processing
c. Batch and Split
3. Optimization
4. Training the DNN
10

Data Exploration + Validation
Data:
● 7K gray-scale images of detected faces
● 96x96 pixels per image
● 15 landmarks per image (?)
Data validation:
● Some Landmarks are missing
11
1

Pre-Processing
12
Data
Normalization
Shuffle train data

Batch
-
- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant

Train / Validation Split
14
Classification - Train/Validation preserve classes proportion

1. Data Analysis
a. Layers Definition
b. Layers Implementation
3. Optimization
4. Training
15

Architecture
16
X Y
Conv Pool Dense Output

1. Data Analysis
3. Optimization
a. Back Propagation
b. Objective
c. SGD
d. Updates
e. Convergence Tuning
4. Training the DNN 22

Back Propagation
Forward Path
23
Conv Dense
X Y
Output
Points

Back Propagation
Forward Path
24
X Y
Conv
Output
PointsDense
X Y
Training
Points

Back Propagation
Backward Path
25
X Y
Conv Dense

Back Propagation
Update
26
Conv Dense
For All Layers:

S.G.D
28
Updates the network after each batch
Karpathy - “Babysitting”: weights/updates ~1e3

Optimization - Updates
29
Alec Radford

Adjusting Learning Rate & Momentum
30
Linear in epoch

Convergence Tuning
31
stops according to validation loss
returns best weights

1. Data Analysis
3. Optimization
4. Training the DNN
a. Fit
b. Fine Tune Pre-Trained
c. Learning Curves
32

Fit
33
Loop over validation batchs
Forward
Loop over train batchs
Forward+BackProp

Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist

Learning Curves
Loop over 6 Nets:
35
Epochs

Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence
Jittering
EpochsEpochs
RMSE
RMSE

Part 1 Summary
Training a DNN:
37

Outline
● Problem Definition
● Motivation
● Training a DNN
● Improving the DNN
● Open Source Packages
● Summary
40

Beyond Training
1. Improving the DNN
a. Analysis Capabilities
b. Augmentation
c. Forward - Backward Path
d. Monitor Layers’ Training
2. Open Source Packages
3. Summary
41

Improving the DNN
Very tempting:
● >1M images
● >1M parameters
● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
42

Analysis Capabilities
1. Theoretical explanation
a. Eg. dropout and augmentation decrease overfit
2. Empirical claims about a phenomena
a. Eg. normalization improves convergence
3. Numerical understanding
a. Eg. exploding / vanishing updates
43

Reduce Overfitting
Solution:
Data Augmentation
44
Net 1
Net 2
Overfitting
Epochs

Data Augmentation
Horizontal Flip Perturbation
45
1

Advanced Augmentation
http://benanne.github.io/2015/03/17/plankton.html 46

Convergence Challenges
47
Need to monitor forward + backward path
EpochsEpochs
RMSE
Data ErrorNormalization

Forward - Backward Path
Forward
Backward:
Gradient w.r.t parameters
48

Monitor Layers’ Training
nolearn - visualize.py
49

Monitor Layers’ Training
50
X. Glorot ,Y. Bengio, Understanding the difficulty of training deep feedforward neural networks:
“Monitoring activation and gradients across layers and training
iterations is a powerful investigation tool”
Easy to monitor in Theano Framework

Weight Initialization matters (1)
51
Layer 1- Gradient are close to zero - vanishing gradients

Weight Initialization matters (2)
52
Network returns close to zero values for all inputs

Monitoring Activation
plateaus sometimes seen when training neural
networks
53
For most epochs the network returns close to zero output for all inputs
Objective plateaus sometimes can be explained by saturation

Max of Weights of Conv1:
Max of Updates of Conv1:
54http://cs231n.github.io/neural-networks-3/#baby
Monitoring weights/update ratio
3e-1
2e-1
1e-1
0
3e-3
2e-3
1e-3
0
Epoch
Epoch

Beyond Training
1. Improving the DNN
2. Open Source Packages
a. Hardware and OS
b. Python Framework
c. Deep Learning Open Source Packages
d. Effort Estimation
3. Summary
55

Hardware and OS
● Amazon Cloud GPU:
AWS Lasagne GPU Setup
Spot ~ $0.0031 per GPU Instance Hour
● IBM Cloud GPU:
http://www-03.ibm.com/systems/platformcomputing/products/symphony/gpuharvesting.html
● Your Linux machine GPU:
pip install -r https://raw.githubusercontent.com/dnouri/kfkd-
tutorial/master/requirements.txt
● Window install
http://deeplearning.net/software/theano/install_windows.html#install-windows
56

Starting Tips
● Sanity Checks:
○ DNN Architecture : “Overfit a tiny subset of data” Karpathy
○ Check Regularization ↗ Loss ↗
● Use pre-trained VGG as a base line
● Start with ~3 conv layer with ~16 filter each - quickly iterate
57

Python
● Rich eco-system
● State-of-the-art
● Easy to port from prototype to production
58
Podcast : http://www.reversim.com/2015/10/277-scientific-python.html

Python Deep
Learning Framework
59Keras ,pylearn2, OpenDeep, Lasagne - common base

Tips from Deep Learning Packages
Torch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format
defining experiment’s configuration
60

Deep Learning
Open Source Packages
61
Caffe for applications
Torch and Theano for research on Deep Learning itself
http://fastml.com/torch-vs-theano/
Black BoxWhite Box
Open source progress rapidly→ impossible to predict industry’s standard

Disruptive Effort Estimation
Feature Eng Deep Learning
62Still requires algorithmic expertise

Summary
● Dove into Training a DNN
● Presented Analysis Capabilities
● Reviewed Open Source Packages
63

References
Hinton Coursera Neuronal Network
https://www.coursera.org/course/neuralnets
Technion Deep Learning course
http://moodle.technion.ac.il/course/view.php?id=4128
Oxford Deep Learning course
https://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53Fu
CS231n CNN for Visual Recognition
http://cs231n.github.io/
Deep Learning Book
http://www.iro.umontreal.ca/~bengioy/dlbook/
Montreal DL summer school
http://videolectures.net/deeplearning2015_montreal/
64

Questions?
65
Deep
Convolution
Regression
Network

Hands-on Deep Learning in Python

More Related Content

What's hot

Viewers also liked

Similar to Hands-on Deep Learning in Python

Recently uploaded

Hands-on Deep Learning in Python