KEMBAR78
Simple, fast, and scalable torch7 tutorial | PDF
Simple, Fast, and Scalable
Torch7 Tutorial
Jin-Hwa Kim
Biointelligence Lab.
Program in Cognitive Science
SNU
Twitter@jnhwkim
BITable of Contents
■ Installing Torch
■ Simple examples
■ Tensor System
■ Notable Packages
• nn
• optim
• dp
• rnn
■ References
BIInstalling Torch
■ Installing Torch
$ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/
install-deps | bash
$ git clone https://github.com/torch/distro.git ~/torch --recursive
$ cd ~/torch; ./install.sh
■ LuaRocks
• Lua package manager (like a apt-get or homebrew for linux or OSX)
$ luarocks install image
$ luarocks list
■ TREPL
• Torch read-eval-print loop
$ th
$ th file.lua
$ th -i interactive.lua
BISimple example
■ Quick Lua (python + Matlab ?)
• object.method(self, a, b) ≡ object:method(a, b)
• index starts with 1, i = i + 1
• if not <condition> and/or <condition2> then <statement1> elseif
<statement2> else <statement3> end
• for i=1,10,2 do <statement> end
• You can break, but cannot continue.
• Generic data structure: table (JSON?)
{}, {1,2,3}, {‘a’=1, ‘b’=2, ‘c’=3}, {{1,2},{3,4}}, #table
■ Few Torch Functions
• rand() which creates tensor drawn from uniform distribution
• t() which transposes a tensor (note it returns a new view)
• dot() which performs a dot product between two tensors
• eye() which returns a identity matrix
• * operator over matrices (which performs a matrix-vector or matrix-matrix
multiplication)
BITensor System (1/6)
■ Fundamental data class, Tensor
• Handling numeric data
• Serializable (If you want, can save as a file.)
• Tensor interprets a chunk of memory as having dimensions.
■ Size
> x:nDimension()
6
> x:size() —- use x:size(dim) for a specific dimension
4
5
6
2
7
3
[torch.LongStorage of size 6]
■ Access
> x[3][4][5]
BITensor System (2/6)
■ Memory Contiguous
• It’s a C style, not Fortran.


x = torch.Tensor(4,5)
i = 0
x:apply(function()
i = i + 1
return i
end)
> x
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
[torch.DoubleTensor of dimension 4x5]
> x:stride()
5
1 -- element in the last dimension are contiguous!
[torch.LongStorage of size 2]
BITensor System (3/6)
■ Tensor Types
ByteTensor -- contains unsigned chars
CharTensor -- contains signed chars
ShortTensor -- contains shorts
IntTensor -- contains ints
FloatTensor -- contains floats
DoubleTensor -- contains doubles
■ Most numeric operations are implemented only for FloatTensor and
DoubleTensor (e.g. torch.histc()).
> torch.histc(torch.IntTensor(5))
[string "_RESULT={torch.histc(a)}"]:1: torch.IntTensor does not implement the
torch.histc() function
stack traceback:
[C]: in function 'histc'
[string "_RESULT={torch.histc(a)}"]:1: in main chunk
[C]: in function 'xpcall'
/Users/Calvin/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl'
...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk
[C]: at 0x010e9422f0
torch.setdefaulttensortype(‘torch.FloatTensor')
a = torch.Tensor()
a:type()
a:size(dim)
BITensor System (4/6)
■ Querying elements
> x[2][3] -- returns row 2, column 3
6
> x[{2,3}] -- another way to return row 2, column 3
6
> x[torch.LongStorage{2,3}] -- yet another way to return row 2, column 3
6
> x[torch.le(x,3)] -- torch.le returns a ByteTensor that acts as a mask
1
2
3
[torch.DoubleTensor of dimension 3]
■ Extracting sub-tensors
[self] narrow(dim, index, size)
[Tensor] sub(dim1s, dim1e ... [, dim4s [, dim4e]])
[Tensor] select(dim, index)
or just using operator [] …
BITensor System (5/6)
■ Indexing operator []
x = torch.Tensor(5, 6):zero()
> x
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 1,3 }] = 1 -- sets element at
> x (i=1,j=3) to 1
0 0 1 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ 2,{2,4} }] = 2 -- sets a slice
> x of 3 elements to 2
0 0 1 0 0 0
0 2 2 2 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},4 }] = -1 -- sets the full 4th column to -1
> x
0 0 1 -1 0 0
0 2 2 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
0 0 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[{ {},2 }] = torch.range(1,5) -- copy a 1D tensor
> x to a slice of x
0 1 1 -1 0 0
0 2 2 -1 0 0
0 3 0 -1 0 0
0 4 0 -1 0 0
0 5 0 -1 0 0
[torch.DoubleTensor of dimension 5x6]
x[torch.lt(x,0)] = -2 -- sets all negative elements
> x to -2 via a bytetensor mask
0 1 1 -2 0 0
0 2 2 -2 0 0
0 3 0 -2 0 0
0 4 0 -2 0 0
0 5 0 -2 0 0
[torch.DoubleTensor of dimension 5x6]
BITensor System (6/6)
■ And so on …
• See how many functions are the same you’ve used in Matlab!
⋮
[Tensor] gather(dim, index)
[LongTensor] nonzero(tensor)
[result] expand([result,] sizes)
[Tensor] repeatTensor([result,] sizes)
[Tensor] squeeze([dim])
[Tensor] transpose(dim1, dim2)
[Tensor] permute(dim1, dim2, ..., dimn)
[Tensor] unfold(dim, size, step)
⋮
BINotable Packages nn
■ Stands for nueral network
• It provides a modular way to build a complex model.
■ Modules
• Module: abstract class
• Containers: e.g. Sequential, Parallel and Concat
• Transfer functions: e.g. Tanh, ReLU and Sigmoid
• Simple layers: e.g. Linear, Mean, Max and Reshape
• Convolutional layers: Temporal, Spatial and Volumetric (3d)
■ Criterions
• Criterions: abstract class
• MSECriterion
• ClassNLLCriterion
BINotable Packages nn
■ nn example of manual training (though you gonna not using this)
• Model definition
require "nn"
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = 2; outputs = 1; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
• Loss function
criterion = nn.MSECriterion()
• Training
for i = 1,2500 do
-- random sample
local input= torch.randn(2); -- normally distributed example in 2d
local output= torch.Tensor(1);
if input[1]*input[2] > 0 then -- calculate label for XOR function
output[1] = -1
else output[1] = 1 end
-- feed it to the neural network and the criterion
criterion:forward(mlp:forward(input), output) —- test using mlp:forward(input)
-- train over this example in 3 steps
-- (1) zero the accumulation of the gradients
mlp:zeroGradParameters()
-- (2) accumulate gradients
mlp:backward(input, criterion:backward(mlp.output, output))
-- (3) update parameters with a 0.01 learning rate
mlp:updateParameters(0.01)
end
BINotable Packages optim
■ optim
• Example - SGD
require “optim”
state = {
learningRate = 1e-3,
momentum = 0.5,
maxIter = 100
}
for i,sample in ipairs(training_samples) do
local func = function(x)
-- define eval function
return f,df_dx
end
optim.sgd(func,x,state)
end
• Algorithms
- adadelta, adagrad, asgd, cg, lbfgs, nag, …
BINotable Packages dp
■ Command-line Arguments
--[[command line arguments]]--
cmd = torch.CmdLine()
cmd:text()
cmd:text('Image Classification using MLP Training/Optimization')
cmd:text('Example:')
cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5')
cmd:text('Options:')
cmd:option('--learningRate', 0.1, 'learning rate at t=0')
cmd:option('--schedule', '{}', 'learning rate schedule')
cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer')
⋮
cmd:option('--batchSize', 32, 'number of examples per batch')
cmd:option('--cuda', false, 'use CUDA')
cmd:option('--useDevice', 1, 'sets the device (GPU) to use')
cmd:option('--maxEpoch', 100, 'maximum number of epochs to run')
cmd:option('--dropout', false, 'apply dropout on hidden neurons')
cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this')
cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100')
cmd:option('--standardize', false, 'apply Standardize preprocessing')
cmd:option('--zca', false, 'apply Zero-Component Analysis whitening')
cmd:option('--progress', false, 'display progress bar')
cmd:option('--silent', false, 'dont print anything to stdout')
cmd:text()
opt = cmd:parse(arg or {})
opt.schedule = dp.returnString(opt.schedule)
opt.hiddenSize = dp.returnString(opt.hiddenSize)
if not opt.silent then
table.print(opt)
end
BINotable Packages dp
■ Preprocess
--[[preprocessing]]--
local input_preprocess = {}
if opt.standardize then
table.insert(input_preprocess, dp.Standardize())
end
if opt.zca then
table.insert(input_preprocess, dp.ZCA())
end
if opt.lecunlcn then
table.insert(input_preprocess, dp.GCN())
table.insert(input_preprocess, dp.LeCunLCN{progress=true})
end
■ DataSource
--[[data]]--
if opt.dataset == 'Mnist' then
ds = dp.Mnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'NotMnist' then
ds = dp.NotMnist{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar10' then
ds = dp.Cifar10{input_preprocess = input_preprocess}
elseif opt.dataset == 'Cifar100' then
ds = dp.Cifar100{input_preprocess = input_preprocess}
else
error("Unknown Dataset")
end
BINotable Packages dp
■ Model of Modules
--[[Model]]--
model = nn.Sequential()
model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts)
-- hidden layers
inputSize = ds:featureSize()
for i,hiddenSize in ipairs(opt.hiddenSize) do
model:add(nn.Linear(inputSize, hiddenSize)) -- parameters
if opt.batchNorm then
model:add(nn.BatchNormalization(hiddenSize))
end
model:add(nn.Tanh())
if opt.dropout then
model:add(nn.Dropout())
end
inputSize = hiddenSize
end
-- output layer
model:add(nn.Linear(inputSize, #(ds:classes())))
model:add(nn.LogSoftMax())
BINotable Packages dp
■ Propagator
--[[Propagators]]--
if opt.lrDecay == 'adaptive' then
ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor}
elseif opt.lrDecay == 'linear' then
opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch
end
train = dp.Optimizer{
acc_update = opt.accUpdate, loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()),
epoch_callback = function(model, report) -- called every epoch
-- learning rate decay
if report.epoch > 0 then
if opt.lrDecay == 'adaptive' then
opt.learningRate = opt.learningRate*ad.decay
ad.decay = 1
elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then
opt.learningRate = opt.schedule[report.epoch]
elseif opt.lrDecay == 'linear' then
opt.learningRate = opt.learningRate + opt.decayFactor
end
opt.learningRate = math.max(opt.minLR, opt.learningRate)
if not opt.silent then
print("learningRate", opt.learningRate)
end
end
end,
callback = function(model, report) -- called every batch
if opt.accUpdate then
model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate)
else
model:updateGradParameters(opt.momentum) -- affects gradParams
model:updateParameters(opt.learningRate) -- affects params
end
model:maxParamNorm(opt.maxOutNorm) -- affects params
model:zeroGradParameters() -- affects gradParams
end,
feedback = dp.Confusion(), sampler = dp.ShuffleSampler{batch_size = opt.batchSize}, progress = opt.progress
}
valid = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
}
test = dp.Evaluator{
feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize}
}
BINotable Packages dp
■ Experiment
--[[Experiment]]--
xp = dp.Experiment{
model = model,
optimizer = train,
validator = valid,
tester = test,
observer = {
dp.FileLogger(),
dp.EarlyStopper{
error_report = {'validator','feedback','confusion','accuracy'},
maximize = true,
max_epochs = opt.maxTries
}
},
random_seed = os.time(),
max_epoch = opt.maxEpoch
}
BINotable Packages dp
■ Running the Experiment
--[[GPU or CPU]]--
if opt.cuda then
require 'cutorch'
require 'cunn'
cutorch.setDevice(opt.useDevice)
xp:cuda()
end
--[[Experiment]]--
xp:run(ds)
■ Loading the saved Experiment
require 'dp'
require 'cuda' -- if you used cmd-line argument --cuda
xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat")
model = xp:model()
print(torch.type(model))
nn.Serial
model = model.module
print(torch.type(model))
nn.Sequential
BINotable Packages rnn
■ RNN
-- recurrent layer
local rnn
if opt.lstm then
-- Long Short Term Memory
rnn = nn.Sequencer(nn.FastLSTM(inputSize, hiddenSize))
else
-- simple recurrent neural network
rnn = nn.Recurrent(
hiddenSize, -- first step will use nn.Add
nn.Identity(), -- for efficiency (see above input layer)
nn.Linear(hiddenSize, hiddenSize), -- feedback layer (recurrence)
nn.Sigmoid(), -- transfer function
99999 -- maximum number of time-steps per sequence
)
if opt.zeroFirst then
-- this is equivalent to forwarding a zero vector through the feedback layer
rnn.startModule:share(rnn.feedbackModule, 'bias')
end
rnn = nn.Sequencer(rnn)
end
BIContribute!
Responsive merging pull requests…
BIQ1: What is the advantage over Theano?
■ Nicholas Leonard’ case
• He is from the LISA lab (Mecca of Theano) switched from Theano to Torch
for some reasons. He said:
• Easy to code components in C/Cuda.
• Theano is a C/CUDA compiler, which enables to perform automatic gradient
differentiation. Whereas Torch7 is not such a compiler, so you don’t need to
be symbolical, which means you don’t need to wait another 5 min for
compiling. The compiling also makes debugging harder.
• Pylearn2 adds fancy features to Theano, however, it isn’t easy. You have to
learn a new kind of programming always thinking symbolically, with the risk
of exeptions for it. The exeption may be your research.

(Though dp is Pylearn2-like alternative for Torch7.)
• So, I was tired, and wanted to get back to non-symbolic programming.
https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
BIQ1: What is the advantage over Theano?
■ Yann LeCun’s recommandation
• Torch is used at Facebook AI Research and in other parts of Facebook.
• It's also used heavily at Deep Mind (now Google) and people in the Google
Brain group have started to use it too.
• Naturally, it's used at NYU and IDIAP where much of the original
development came from. But it's also used at INRIA in Paris, MSR in New
York, Intel, and a number of startups.
https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
BIQ2: Is the # of dimmensions unlimited?
■ Torch7 supports an unlimited multi-dimmensional Tensor matrix.
• The number of dimensions is unlimited that can be created using
LongStorage with more dimensions.
--- creation of a 4D-tensor 4x5x6x2
z = torch.Tensor(4,5,6,2)
--- for more dimensions, (here a 6D tensor) one can do:
s = torch.LongStorage(6)
--- assigning lengths for each dimension, not values
s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3;
x = torch.Tensor(s)
> x:nDimension()
6
> x:size()
4
5
6
2
7
3
[torch.LongStorage of size 6]
BI
• http://torch.ch/docs
• https://github.com/torch/torch7/wiki/Cheatsheet
• https://github.com/torch
• https://github.com/nicholas-leonard/dp
• https://github.com/Element-Research/rnn
• https://github.com/karpathy/char-rnn
References

Simple, fast, and scalable torch7 tutorial

  • 1.
    Simple, Fast, andScalable Torch7 Tutorial Jin-Hwa Kim Biointelligence Lab. Program in Cognitive Science SNU Twitter@jnhwkim
  • 2.
    BITable of Contents ■Installing Torch ■ Simple examples ■ Tensor System ■ Notable Packages • nn • optim • dp • rnn ■ References
  • 3.
    BIInstalling Torch ■ InstallingTorch $ curl -s https://raw.githubusercontent.com/torch/ezinstall/master/ install-deps | bash $ git clone https://github.com/torch/distro.git ~/torch --recursive $ cd ~/torch; ./install.sh ■ LuaRocks • Lua package manager (like a apt-get or homebrew for linux or OSX) $ luarocks install image $ luarocks list ■ TREPL • Torch read-eval-print loop $ th $ th file.lua $ th -i interactive.lua
  • 4.
    BISimple example ■ QuickLua (python + Matlab ?) • object.method(self, a, b) ≡ object:method(a, b) • index starts with 1, i = i + 1 • if not <condition> and/or <condition2> then <statement1> elseif <statement2> else <statement3> end • for i=1,10,2 do <statement> end • You can break, but cannot continue. • Generic data structure: table (JSON?) {}, {1,2,3}, {‘a’=1, ‘b’=2, ‘c’=3}, {{1,2},{3,4}}, #table ■ Few Torch Functions • rand() which creates tensor drawn from uniform distribution • t() which transposes a tensor (note it returns a new view) • dot() which performs a dot product between two tensors • eye() which returns a identity matrix • * operator over matrices (which performs a matrix-vector or matrix-matrix multiplication)
  • 5.
    BITensor System (1/6) ■Fundamental data class, Tensor • Handling numeric data • Serializable (If you want, can save as a file.) • Tensor interprets a chunk of memory as having dimensions. ■ Size > x:nDimension() 6 > x:size() —- use x:size(dim) for a specific dimension 4 5 6 2 7 3 [torch.LongStorage of size 6] ■ Access > x[3][4][5]
  • 6.
    BITensor System (2/6) ■Memory Contiguous • It’s a C style, not Fortran. 
 x = torch.Tensor(4,5) i = 0 x:apply(function() i = i + 1 return i end) > x 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 [torch.DoubleTensor of dimension 4x5] > x:stride() 5 1 -- element in the last dimension are contiguous! [torch.LongStorage of size 2]
  • 7.
    BITensor System (3/6) ■Tensor Types ByteTensor -- contains unsigned chars CharTensor -- contains signed chars ShortTensor -- contains shorts IntTensor -- contains ints FloatTensor -- contains floats DoubleTensor -- contains doubles ■ Most numeric operations are implemented only for FloatTensor and DoubleTensor (e.g. torch.histc()). > torch.histc(torch.IntTensor(5)) [string "_RESULT={torch.histc(a)}"]:1: torch.IntTensor does not implement the torch.histc() function stack traceback: [C]: in function 'histc' [string "_RESULT={torch.histc(a)}"]:1: in main chunk [C]: in function 'xpcall' /Users/Calvin/torch/install/share/lua/5.1/trepl/init.lua:630: in function 'repl' ...lvin/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:185: in main chunk [C]: at 0x010e9422f0 torch.setdefaulttensortype(‘torch.FloatTensor') a = torch.Tensor() a:type() a:size(dim)
  • 8.
    BITensor System (4/6) ■Querying elements > x[2][3] -- returns row 2, column 3 6 > x[{2,3}] -- another way to return row 2, column 3 6 > x[torch.LongStorage{2,3}] -- yet another way to return row 2, column 3 6 > x[torch.le(x,3)] -- torch.le returns a ByteTensor that acts as a mask 1 2 3 [torch.DoubleTensor of dimension 3] ■ Extracting sub-tensors [self] narrow(dim, index, size) [Tensor] sub(dim1s, dim1e ... [, dim4s [, dim4e]]) [Tensor] select(dim, index) or just using operator [] …
  • 9.
    BITensor System (5/6) ■Indexing operator [] x = torch.Tensor(5, 6):zero() > x 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ 1,3 }] = 1 -- sets element at > x (i=1,j=3) to 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ 2,{2,4} }] = 2 -- sets a slice > x of 3 elements to 2 0 0 1 0 0 0 0 2 2 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 [torch.DoubleTensor of dimension 5x6] x[{ {},4 }] = -1 -- sets the full 4th column to -1 > x 0 0 1 -1 0 0 0 2 2 -1 0 0 0 0 0 -1 0 0 0 0 0 -1 0 0 0 0 0 -1 0 0 [torch.DoubleTensor of dimension 5x6] x[{ {},2 }] = torch.range(1,5) -- copy a 1D tensor > x to a slice of x 0 1 1 -1 0 0 0 2 2 -1 0 0 0 3 0 -1 0 0 0 4 0 -1 0 0 0 5 0 -1 0 0 [torch.DoubleTensor of dimension 5x6] x[torch.lt(x,0)] = -2 -- sets all negative elements > x to -2 via a bytetensor mask 0 1 1 -2 0 0 0 2 2 -2 0 0 0 3 0 -2 0 0 0 4 0 -2 0 0 0 5 0 -2 0 0 [torch.DoubleTensor of dimension 5x6]
  • 10.
    BITensor System (6/6) ■And so on … • See how many functions are the same you’ve used in Matlab! ⋮ [Tensor] gather(dim, index) [LongTensor] nonzero(tensor) [result] expand([result,] sizes) [Tensor] repeatTensor([result,] sizes) [Tensor] squeeze([dim]) [Tensor] transpose(dim1, dim2) [Tensor] permute(dim1, dim2, ..., dimn) [Tensor] unfold(dim, size, step) ⋮
  • 11.
    BINotable Packages nn ■Stands for nueral network • It provides a modular way to build a complex model. ■ Modules • Module: abstract class • Containers: e.g. Sequential, Parallel and Concat • Transfer functions: e.g. Tanh, ReLU and Sigmoid • Simple layers: e.g. Linear, Mean, Max and Reshape • Convolutional layers: Temporal, Spatial and Volumetric (3d) ■ Criterions • Criterions: abstract class • MSECriterion • ClassNLLCriterion
  • 12.
    BINotable Packages nn ■nn example of manual training (though you gonna not using this) • Model definition require "nn" mlp = nn.Sequential(); -- make a multi-layer perceptron inputs = 2; outputs = 1; HUs = 20; -- parameters mlp:add(nn.Linear(inputs, HUs)) mlp:add(nn.Tanh()) mlp:add(nn.Linear(HUs, outputs)) • Loss function criterion = nn.MSECriterion() • Training for i = 1,2500 do -- random sample local input= torch.randn(2); -- normally distributed example in 2d local output= torch.Tensor(1); if input[1]*input[2] > 0 then -- calculate label for XOR function output[1] = -1 else output[1] = 1 end -- feed it to the neural network and the criterion criterion:forward(mlp:forward(input), output) —- test using mlp:forward(input) -- train over this example in 3 steps -- (1) zero the accumulation of the gradients mlp:zeroGradParameters() -- (2) accumulate gradients mlp:backward(input, criterion:backward(mlp.output, output)) -- (3) update parameters with a 0.01 learning rate mlp:updateParameters(0.01) end
  • 13.
    BINotable Packages optim ■optim • Example - SGD require “optim” state = { learningRate = 1e-3, momentum = 0.5, maxIter = 100 } for i,sample in ipairs(training_samples) do local func = function(x) -- define eval function return f,df_dx end optim.sgd(func,x,state) end • Algorithms - adadelta, adagrad, asgd, cg, lbfgs, nag, …
  • 14.
    BINotable Packages dp ■Command-line Arguments --[[command line arguments]]-- cmd = torch.CmdLine() cmd:text() cmd:text('Image Classification using MLP Training/Optimization') cmd:text('Example:') cmd:text('$> th neuralnetwork.lua --batchSize 128 --momentum 0.5') cmd:text('Options:') cmd:option('--learningRate', 0.1, 'learning rate at t=0') cmd:option('--schedule', '{}', 'learning rate schedule') cmd:option('--hiddenSize', '{200,200}', 'number of hidden units per layer') ⋮ cmd:option('--batchSize', 32, 'number of examples per batch') cmd:option('--cuda', false, 'use CUDA') cmd:option('--useDevice', 1, 'sets the device (GPU) to use') cmd:option('--maxEpoch', 100, 'maximum number of epochs to run') cmd:option('--dropout', false, 'apply dropout on hidden neurons') cmd:option('--batchNorm', false, 'use batch normalization. dropout is mostly redundant with this') cmd:option('--dataset', 'Mnist', 'which dataset to use : Mnist | NotMnist | Cifar10 | Cifar100') cmd:option('--standardize', false, 'apply Standardize preprocessing') cmd:option('--zca', false, 'apply Zero-Component Analysis whitening') cmd:option('--progress', false, 'display progress bar') cmd:option('--silent', false, 'dont print anything to stdout') cmd:text() opt = cmd:parse(arg or {}) opt.schedule = dp.returnString(opt.schedule) opt.hiddenSize = dp.returnString(opt.hiddenSize) if not opt.silent then table.print(opt) end
  • 15.
    BINotable Packages dp ■Preprocess --[[preprocessing]]-- local input_preprocess = {} if opt.standardize then table.insert(input_preprocess, dp.Standardize()) end if opt.zca then table.insert(input_preprocess, dp.ZCA()) end if opt.lecunlcn then table.insert(input_preprocess, dp.GCN()) table.insert(input_preprocess, dp.LeCunLCN{progress=true}) end ■ DataSource --[[data]]-- if opt.dataset == 'Mnist' then ds = dp.Mnist{input_preprocess = input_preprocess} elseif opt.dataset == 'NotMnist' then ds = dp.NotMnist{input_preprocess = input_preprocess} elseif opt.dataset == 'Cifar10' then ds = dp.Cifar10{input_preprocess = input_preprocess} elseif opt.dataset == 'Cifar100' then ds = dp.Cifar100{input_preprocess = input_preprocess} else error("Unknown Dataset") end
  • 16.
    BINotable Packages dp ■Model of Modules --[[Model]]-- model = nn.Sequential() model:add(nn.Convert(ds:ioShapes(), 'bf')) -- to batchSize x nFeature (also type converts) -- hidden layers inputSize = ds:featureSize() for i,hiddenSize in ipairs(opt.hiddenSize) do model:add(nn.Linear(inputSize, hiddenSize)) -- parameters if opt.batchNorm then model:add(nn.BatchNormalization(hiddenSize)) end model:add(nn.Tanh()) if opt.dropout then model:add(nn.Dropout()) end inputSize = hiddenSize end -- output layer model:add(nn.Linear(inputSize, #(ds:classes()))) model:add(nn.LogSoftMax())
  • 17.
    BINotable Packages dp ■Propagator --[[Propagators]]-- if opt.lrDecay == 'adaptive' then ad = dp.AdaptiveDecay{max_wait = opt.maxWait, decay_factor=opt.decayFactor} elseif opt.lrDecay == 'linear' then opt.decayFactor = (opt.minLR - opt.learningRate)/opt.saturateEpoch end train = dp.Optimizer{ acc_update = opt.accUpdate, loss = nn.ModuleCriterion(nn.ClassNLLCriterion(), nil, nn.Convert()), epoch_callback = function(model, report) -- called every epoch -- learning rate decay if report.epoch > 0 then if opt.lrDecay == 'adaptive' then opt.learningRate = opt.learningRate*ad.decay ad.decay = 1 elseif opt.lrDecay == 'schedule' and opt.schedule[report.epoch] then opt.learningRate = opt.schedule[report.epoch] elseif opt.lrDecay == 'linear' then opt.learningRate = opt.learningRate + opt.decayFactor end opt.learningRate = math.max(opt.minLR, opt.learningRate) if not opt.silent then print("learningRate", opt.learningRate) end end end, callback = function(model, report) -- called every batch if opt.accUpdate then model:accUpdateGradParameters(model.dpnn_input, model.output, opt.learningRate) else model:updateGradParameters(opt.momentum) -- affects gradParams model:updateParameters(opt.learningRate) -- affects params end model:maxParamNorm(opt.maxOutNorm) -- affects params model:zeroGradParameters() -- affects gradParams end, feedback = dp.Confusion(), sampler = dp.ShuffleSampler{batch_size = opt.batchSize}, progress = opt.progress } valid = dp.Evaluator{ feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize} } test = dp.Evaluator{ feedback = dp.Confusion(), sampler = dp.Sampler{batch_size = opt.batchSize} }
  • 18.
    BINotable Packages dp ■Experiment --[[Experiment]]-- xp = dp.Experiment{ model = model, optimizer = train, validator = valid, tester = test, observer = { dp.FileLogger(), dp.EarlyStopper{ error_report = {'validator','feedback','confusion','accuracy'}, maximize = true, max_epochs = opt.maxTries } }, random_seed = os.time(), max_epoch = opt.maxEpoch }
  • 19.
    BINotable Packages dp ■Running the Experiment --[[GPU or CPU]]-- if opt.cuda then require 'cutorch' require 'cunn' cutorch.setDevice(opt.useDevice) xp:cuda() end --[[Experiment]]-- xp:run(ds) ■ Loading the saved Experiment require 'dp' require 'cuda' -- if you used cmd-line argument --cuda xp = torch.load("/home/nicholas/save/xps:1432747515:1.dat") model = xp:model() print(torch.type(model)) nn.Serial model = model.module print(torch.type(model)) nn.Sequential
  • 20.
    BINotable Packages rnn ■RNN -- recurrent layer local rnn if opt.lstm then -- Long Short Term Memory rnn = nn.Sequencer(nn.FastLSTM(inputSize, hiddenSize)) else -- simple recurrent neural network rnn = nn.Recurrent( hiddenSize, -- first step will use nn.Add nn.Identity(), -- for efficiency (see above input layer) nn.Linear(hiddenSize, hiddenSize), -- feedback layer (recurrence) nn.Sigmoid(), -- transfer function 99999 -- maximum number of time-steps per sequence ) if opt.zeroFirst then -- this is equivalent to forwarding a zero vector through the feedback layer rnn.startModule:share(rnn.feedbackModule, 'bias') end rnn = nn.Sequencer(rnn) end
  • 21.
  • 22.
    BIQ1: What isthe advantage over Theano? ■ Nicholas Leonard’ case • He is from the LISA lab (Mecca of Theano) switched from Theano to Torch for some reasons. He said: • Easy to code components in C/Cuda. • Theano is a C/CUDA compiler, which enables to perform automatic gradient differentiation. Whereas Torch7 is not such a compiler, so you don’t need to be symbolical, which means you don’t need to wait another 5 min for compiling. The compiling also makes debugging harder. • Pylearn2 adds fancy features to Theano, however, it isn’t easy. You have to learn a new kind of programming always thinking symbolically, with the risk of exeptions for it. The exeption may be your research.
 (Though dp is Pylearn2-like alternative for Torch7.) • So, I was tired, and wanted to get back to non-symbolic programming. https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
  • 23.
    BIQ1: What isthe advantage over Theano? ■ Yann LeCun’s recommandation • Torch is used at Facebook AI Research and in other parts of Facebook. • It's also used heavily at Deep Mind (now Google) and people in the Google Brain group have started to use it too. • Naturally, it's used at NYU and IDIAP where much of the original development came from. But it's also used at INRIA in Paris, MSR in New York, Intel, and a number of startups. https://plus.google.com/+YannLeCunPhD/posts/iJV2tJgpF16
  • 24.
    BIQ2: Is the# of dimmensions unlimited? ■ Torch7 supports an unlimited multi-dimmensional Tensor matrix. • The number of dimensions is unlimited that can be created using LongStorage with more dimensions. --- creation of a 4D-tensor 4x5x6x2 z = torch.Tensor(4,5,6,2) --- for more dimensions, (here a 6D tensor) one can do: s = torch.LongStorage(6) --- assigning lengths for each dimension, not values s[1] = 4; s[2] = 5; s[3] = 6; s[4] = 2; s[5] = 7; s[6] = 3; x = torch.Tensor(s) > x:nDimension() 6 > x:size() 4 5 6 2 7 3 [torch.LongStorage of size 6]
  • 25.
    BI • http://torch.ch/docs • https://github.com/torch/torch7/wiki/Cheatsheet •https://github.com/torch • https://github.com/nicholas-leonard/dp • https://github.com/Element-Research/rnn • https://github.com/karpathy/char-rnn References