KEMBAR78
Deep learning with C++ - an introduction to tiny-dnn | PDF
deep learning with c++
an introduction to tiny-dnn
by Taiga Nomi
embedded software engineer, Osaka, Japan
deep learning
Icons made by Freepik from www.flaticon.com is licensed by CC 3.0 BY
Facial recognition
Image understanding
Finance
Game playing
Translation
Robotics
Drug discovery
Text recognition
Video processing
Text generation
Deep learning
- Learning a complicated function from many data
- Composed of trainable, simple mathematical functions
Input OutputTrainable Building
Blocks
- text
- audio
- image
- video
- ...
- text
- audio
- image
- video
- ...
deep learning framework
A modern deep learning framework for C++ programmers
1400 stars
500 forks
35 contributors
100 clones/day
“A Modern Deep Learning module” by Edgar Riba
“Deep Learning with Quantization for Semantic Saliency Detection” by Yida Wang
https://summerofcode.withgoogle.com/archive/
1.Easy to introduce
2.Simple syntax
3.Extensible backends
1.Easy to introduce
- Just put the following line into your cpp
tiny-dnn is header only - No installation
tiny-dnn is dependency-free - No prerequisites
#include <tiny_dnn/tiny_dnn.h>
1.Easy to introduce
- You can bring Deep Learning to any target you have a C++ compiler
- Officially supported (by CI builds):
- Windows (msvc2013 32/64bit, msvc2015 32/64bit)
- Linux (gcc4.9, clang3.5)
- OSX(LLVM 7.3)
- tiny-dnn might run on other compiler that support C++11
1.Easy to introduce
- Caffe model converter is also available
- TensorFlow converter - coming soon!
- Close the gap between researcher and engineer
1.Easy to introduce
2.Simple syntax
3.Extensible backends
2.Simple syntax
Example: Multi layer perceptron
Caffe prototxt
input: "data"
input_shape {
dim: 1
dim: 1
dim: 1
dim: 20
}
layer {
name: "ip1"
type: "InnerProduct"
inner_product_param {
num_output: 100
}
bottom: "ip1"
top: "ip2"
}
layer {
name: "a1"
type: "TanH"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
inner_product_param {
num_output: 10
}
bottom: "ip1"
top: "out"
}
layer {
name: "a1"
type: "TanH"
bottom: "out"
top: "out"
}
Tensorflow
w1 = tf.Variable(tf.random_normal([10, 100]))
w2 = tf.Variable(tf.random_normal([100, 20]))
b1 = tf.Variable(tf.random_normal([100]))
b2 = tf.Variable(tf.random_normal([20]))
layer1 = tf.add(tf.matmul(x, w1), b1)
layer1 = tf.nn.relu(layer1)
layer2 = tf.add(tf.matmul(x, w2), b2)
layer2 = tf.nn.relu(layer2)
Keras
model = Sequential([
Dense(100, input_dim=10),
Activation('relu'),
Dense(20),
Activation('relu'),
])
tiny-dnn
network<sequential> net;
net << dense<relu>(10, 100)
<< dense<relu>(100, 20);
tiny-dnn, another solution
auto net = make_mlp<relu>({10, 100, 20});
- modern C++ enable us to keep code simple
- type inference, initializer list
2.Simple syntax
Example: Convolutional Neural Networks
Caffe prototxt
name: "LeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 64
dim: 1 dim: 28 dim: 28 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
Tensorflow
x = tf.Variable(tf.random_normal([-1, 28, 28, 1]))
wc1 = tf.Variable(tf.random_normal([5, 5, 1, 32]))
wc2 = tf.Variable(tf.random_normal([5, 5, 32, 64]))
wd1 = tf.Variable(tf.random_normal([7*7*64, 1024]))
wout = tf.Variable(tf.random_normal([1024, n_classes]))
bc1 = tf.Variable(tf.random_normal([32]))
bc2 = tf.Variable(tf.random_normal([64]))
bd1 = tf.Variable(tf.random_normal([1024]))
bout = tf.Variable(tf.random_normal([n_classes]))
conv1 = conv2d(x, wc1, bc1)
conv1 = maxpool2d(conv1, k=2)
conv1 = tf.nn.relu(conv1)
conv2 = conv2d(conv1, wc2, bc2)
conv2 = maxpool2d(conv2, k=2)
conv2 = tf.nn.relu(conv2)
fc1 = tf.reshape(conv2, [-1, wd1.get_shape().as_list()[0]])
fc1 = tf.add(tf.matmul(fc1, wd1), bd1)
fc1 = tf.nn.relu(fc1)
fc1 = tf.nn.dropout(fc1, dropout)
out = tf.add(tf.matmul(fc1, wout), bout)
Keras
model = Sequential([
Convolution2D(32, 5, 5, input_shape=[28,28,5]),
MaxPooling2D(pool_size=2),
Activation('relu'),
Convolution2D(64, 5, 5),
MaxPooling2D(pool_size=2),
Activation('relu'),
Dense(1024),
Dropout(0.5),
Dense(10),
])
tiny-dnn
network<sequential> net;
net << conv<>(28, 28, 5, 1, 32)
<< max_pool<relu>(24, 24, 2)
<< conv<>(12, 12, 5, 32, 64)
<< max_pool<relu>(8, 8, 64, 2)
<< fc<relu>(4*4*64, 1024)
<< dropout(1024, 0.5f)
<< fc<>(1024, 10);
1.Easy to introduce
2.Simple syntax
3.Extensible backends
3.Extensible backends
Common scenario1:
“We have a good GPU machine to train networks, but
we need to deploy trained model into mobile device”
Common scenario2:
“We need to write platform-specific code to get
production-level performance... but it’s painful to
understand whole framework”
3.Extensible backends
Some performance critical layers have backend engine
Layer API
backend::internal
pure-c++ code
backend::avx
avx-optimized code …
backend::nnpack
x86/ARM
backend::opencl
GPU
Optional
3.Extensible backends
// select an engine explicitly
net << conv<>(28, 28, 5, 1, 32, backend::avx)
<< ...;
// switch them seamlessly
net[0]->set_backend_type(backend::opencl);
Model serialization (binary/json)
Regression training
Basic image processing
Layer freezing
Graph visualization
Multi-thread execution
Double precision support
Basic functionality
Caffe importer (requires protobuf)
OpenMP support
Intel TBB support
NNPACK backend (same to caffe2)
libdnn backend (same to caffe-opencl)Extra modules
(requires 3rd-party libraries)
Future plans
- GPU integration
- GPU backend is still experimental
- cudnn backend
- More mobile-oriented
- iOS/Android examples
- Quantized operation for less RAM
- TensorFlow Importer
- Performance profiling tools
- OpenVX support
We need your help!
User chat for QA:
https://gitter.im/tiny-dnn
Official documents:
http://tiny-dnn.readthedocs.io/en/latest/
For users
Join our developer chat:
https://gitter.im/tiny-dnn/developers
or
Check out docs, and our issues marked as “contributions welcome”:
https://github.com/tiny-dnn/tiny-dnn/blob/master/docs/developer_gui
des/How-to-contribute.md
https://github.com/tiny-dnn/tiny-dnn/labels/contributions%20welcome
For developers
code: github.com/tiny-dnn/tiny-dnn
slide: https://goo.gl/Se2rzu

Deep learning with C++ - an introduction to tiny-dnn

  • 1.
    deep learning withc++ an introduction to tiny-dnn by Taiga Nomi embedded software engineer, Osaka, Japan
  • 2.
    deep learning Icons madeby Freepik from www.flaticon.com is licensed by CC 3.0 BY Facial recognition Image understanding Finance Game playing Translation Robotics Drug discovery Text recognition Video processing Text generation
  • 3.
    Deep learning - Learninga complicated function from many data - Composed of trainable, simple mathematical functions Input OutputTrainable Building Blocks - text - audio - image - video - ... - text - audio - image - video - ...
  • 4.
  • 6.
    A modern deeplearning framework for C++ programmers
  • 7.
    1400 stars 500 forks 35contributors 100 clones/day
  • 8.
    “A Modern DeepLearning module” by Edgar Riba “Deep Learning with Quantization for Semantic Saliency Detection” by Yida Wang https://summerofcode.withgoogle.com/archive/
  • 9.
    1.Easy to introduce 2.Simplesyntax 3.Extensible backends
  • 10.
    1.Easy to introduce -Just put the following line into your cpp tiny-dnn is header only - No installation tiny-dnn is dependency-free - No prerequisites #include <tiny_dnn/tiny_dnn.h>
  • 11.
    1.Easy to introduce -You can bring Deep Learning to any target you have a C++ compiler - Officially supported (by CI builds): - Windows (msvc2013 32/64bit, msvc2015 32/64bit) - Linux (gcc4.9, clang3.5) - OSX(LLVM 7.3) - tiny-dnn might run on other compiler that support C++11
  • 12.
    1.Easy to introduce -Caffe model converter is also available - TensorFlow converter - coming soon! - Close the gap between researcher and engineer
  • 13.
    1.Easy to introduce 2.Simplesyntax 3.Extensible backends
  • 14.
  • 15.
    Caffe prototxt input: "data" input_shape{ dim: 1 dim: 1 dim: 1 dim: 20 } layer { name: "ip1" type: "InnerProduct" inner_product_param { num_output: 100 } bottom: "ip1" top: "ip2" } layer { name: "a1" type: "TanH" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" inner_product_param { num_output: 10 } bottom: "ip1" top: "out" } layer { name: "a1" type: "TanH" bottom: "out" top: "out" }
  • 16.
    Tensorflow w1 = tf.Variable(tf.random_normal([10,100])) w2 = tf.Variable(tf.random_normal([100, 20])) b1 = tf.Variable(tf.random_normal([100])) b2 = tf.Variable(tf.random_normal([20])) layer1 = tf.add(tf.matmul(x, w1), b1) layer1 = tf.nn.relu(layer1) layer2 = tf.add(tf.matmul(x, w2), b2) layer2 = tf.nn.relu(layer2)
  • 17.
    Keras model = Sequential([ Dense(100,input_dim=10), Activation('relu'), Dense(20), Activation('relu'), ])
  • 18.
    tiny-dnn network<sequential> net; net <<dense<relu>(10, 100) << dense<relu>(100, 20);
  • 19.
    tiny-dnn, another solution autonet = make_mlp<relu>({10, 100, 20}); - modern C++ enable us to keep code simple - type inference, initializer list
  • 20.
  • 21.
    Caffe prototxt name: "LeNet" layer{ name: "data" type: "Input" top: "data" input_param { shape: { dim: 64 dim: 1 dim: 28 dim: 28 } } } layer { name: "conv1" type: "Convolution" bottom: "data" top: "conv1" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 20 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool1" type: "Pooling" bottom: "conv1" top: "pool1" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } } } layer { name: "conv2" type: "Convolution" bottom: "pool1" top: "conv2" param { lr_mult: 1 } param { lr_mult: 2 } convolution_param { num_output: 50 kernel_size: 5 stride: 1 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "pool2" type: "Pooling" bottom: "conv2" top: "pool2" pooling_param { pool: MAX kernel_size: 2 stride: 2 } } layer { name: "ip1" type: "InnerProduct" bottom: "pool2" top: "ip1" param { param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 500 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } } layer { name: "relu1" type: "ReLU" bottom: "ip1" top: "ip1" } layer { name: "ip2" type: "InnerProduct" bottom: "ip1" top: "ip2" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "prob" type: "Softmax" bottom: "ip2" top: "prob" }
  • 22.
    Tensorflow x = tf.Variable(tf.random_normal([-1,28, 28, 1])) wc1 = tf.Variable(tf.random_normal([5, 5, 1, 32])) wc2 = tf.Variable(tf.random_normal([5, 5, 32, 64])) wd1 = tf.Variable(tf.random_normal([7*7*64, 1024])) wout = tf.Variable(tf.random_normal([1024, n_classes])) bc1 = tf.Variable(tf.random_normal([32])) bc2 = tf.Variable(tf.random_normal([64])) bd1 = tf.Variable(tf.random_normal([1024])) bout = tf.Variable(tf.random_normal([n_classes])) conv1 = conv2d(x, wc1, bc1) conv1 = maxpool2d(conv1, k=2) conv1 = tf.nn.relu(conv1) conv2 = conv2d(conv1, wc2, bc2) conv2 = maxpool2d(conv2, k=2) conv2 = tf.nn.relu(conv2) fc1 = tf.reshape(conv2, [-1, wd1.get_shape().as_list()[0]]) fc1 = tf.add(tf.matmul(fc1, wd1), bd1) fc1 = tf.nn.relu(fc1) fc1 = tf.nn.dropout(fc1, dropout) out = tf.add(tf.matmul(fc1, wout), bout)
  • 23.
    Keras model = Sequential([ Convolution2D(32,5, 5, input_shape=[28,28,5]), MaxPooling2D(pool_size=2), Activation('relu'), Convolution2D(64, 5, 5), MaxPooling2D(pool_size=2), Activation('relu'), Dense(1024), Dropout(0.5), Dense(10), ])
  • 24.
    tiny-dnn network<sequential> net; net <<conv<>(28, 28, 5, 1, 32) << max_pool<relu>(24, 24, 2) << conv<>(12, 12, 5, 32, 64) << max_pool<relu>(8, 8, 64, 2) << fc<relu>(4*4*64, 1024) << dropout(1024, 0.5f) << fc<>(1024, 10);
  • 25.
    1.Easy to introduce 2.Simplesyntax 3.Extensible backends
  • 26.
    3.Extensible backends Common scenario1: “Wehave a good GPU machine to train networks, but we need to deploy trained model into mobile device” Common scenario2: “We need to write platform-specific code to get production-level performance... but it’s painful to understand whole framework”
  • 27.
    3.Extensible backends Some performancecritical layers have backend engine Layer API backend::internal pure-c++ code backend::avx avx-optimized code … backend::nnpack x86/ARM backend::opencl GPU Optional
  • 28.
    3.Extensible backends // selectan engine explicitly net << conv<>(28, 28, 5, 1, 32, backend::avx) << ...; // switch them seamlessly net[0]->set_backend_type(backend::opencl);
  • 29.
    Model serialization (binary/json) Regressiontraining Basic image processing Layer freezing Graph visualization Multi-thread execution Double precision support Basic functionality
  • 30.
    Caffe importer (requiresprotobuf) OpenMP support Intel TBB support NNPACK backend (same to caffe2) libdnn backend (same to caffe-opencl)Extra modules (requires 3rd-party libraries)
  • 31.
  • 32.
    - GPU integration -GPU backend is still experimental - cudnn backend - More mobile-oriented - iOS/Android examples - Quantized operation for less RAM - TensorFlow Importer - Performance profiling tools - OpenVX support We need your help!
  • 33.
    User chat forQA: https://gitter.im/tiny-dnn Official documents: http://tiny-dnn.readthedocs.io/en/latest/ For users
  • 34.
    Join our developerchat: https://gitter.im/tiny-dnn/developers or Check out docs, and our issues marked as “contributions welcome”: https://github.com/tiny-dnn/tiny-dnn/blob/master/docs/developer_gui des/How-to-contribute.md https://github.com/tiny-dnn/tiny-dnn/labels/contributions%20welcome For developers
  • 35.