TensorFlow Workshop
Core TF Model
Yet another dataflow system
biases Graph of Nodes, also called Operations or ops.
weights Add Relu
MatMul Xent
examples
labels
s o rs
ith ten
Yet another dataflow systemw
biases Edges are N-dimensional arrays: Tensors
weights Add Relu
MatMul Xent
examples
labels
t a t e
ith s
Yet another dataflow systemw
'Biases' is a variable Some ops compute gradients −= updates biases
biases
... Add ... Mul −=
learning rate
ut e d
is t r i b
Yet another dataflow systemd
Device A Device B
biases
... Add ... Mul −=
learning rate
Devices: Processes, Machines, GPUs, etc
What's not in the Core Model
● Anything about neural networks, machine learning, ...
● Anything about backpropagation, differentiation, ...
● Anything about gradient descent, parameter servers…
These are built by combining existing operations, or defining new operations.
Core system can be applied to other problems than machine learning.
Core TF API
API Families
Graph Construction
● Assemble a Graph of Operations.
Graph Execution
● Deploy and execute operations in a Graph.
Hello, world!
from google3.learning.brain.public import tensorflow as tf
# Create an operation.
hello = tf.Constant("Hello, world!")
# Create a session with the "local" Tensorflow runtime.
sess = tf.Session("local")
# Execute that operation and print its result.
print sess.Run(hello)
Graph Construction
Library of predefined Ops
● Constant, Variables, Math ops, etc.
Functions to add Ops for common needs
● Gradients: Add Ops to compute derivatives.
● Training methods: Add Ops to update variables (SGD, Adagrad, etc.)
All operations are added to a global Default Graph.
Slightly more advanced calls let you control the Graph more precisely.
Value
Constant Constant
Op that outputs a constant value when run. (Surprising?)
from google3.learning.brain.public.tensorflow import *
a = Constant([1.0, 2.0, 3.0, 4.0]) # float vector
b = Constant([[5, 6], [7, 8]]) # int32 2x2 matrix
import numpy as np
c = Constant(np.random.rand(2, 4, 6, 8)) # double 2x4x6x8 tensor
Value Reference
Variable Variable
State
Op that holds state that persists across calls to Run()
v = Variable(shape=[4, 3], dtype=DT_FLOAT) # float 4x3 matrix
Value Reference
Variables Variable
State
Some Ops modify the Variable state: InitVariable, Assign, AssignSub, AssignAdd.
init = Assign(v, RandomParameters(shape=v.shape))
Updates the variable value when run.
Variable
State Assign Outputs the value for convenience
Random
Parameters
Math Ops
A variety of Operations for linear algebra, convolutions, etc.
c = Constant(...) c
w = Variable(...) MatMul
w Add
b = Variable(...)
b
y = Add(MatMul(c, w), b)
Overloaded Python operators help with construction: y = MatMul(c, w) + b
Operations, plenty of them
Documentation at http://go/tensorflow-ops
● Array ops ● Neural network ops
○ Concat ○ Non-linearities (Relu, …)
○ Slice ○ Convolutions (Conv2D, …)
○ Reshape ○ Pooling (AvgPool, …)
○ ... ● ...and many more
● Math ops ○ Constants, Data flow, Control flow,
○ Linear algebra (MatMul, …) Embedding, Initialization, I/O, Legacy
○ Component-wise ops (Mul, ...) Input Layers, Logging, Random,
○ Reduction ops (Sum, …) Sparse, State, Summary, Lua, etc.
Graph Construction Helpers
● Gradients
● Optimizers
● Higher-Level APIs in core TF
● Higher-Level libraries outside core TF
Gradients
Given a loss, add Ops to compute gradients for Variables.
many ops
var0 Op
Op loss
var1 Op
Gradients
Gradients(loss, [var0, var1]) # Generate gradients
many ops
var0 Op
Op loss
var1 Op
Op
Gradients for var0 Op
many ops
Gradients for var1 Op
x
MatMul y
w
Example
Gradients for MatMul
x Transpose
MatMul gw
gy
MatMul gx
w Transpose
Optimizers
Apply gradients to Variables: SGD(var, grad, learning_rate)
var
AssignSub
grad
Mul
learning_rate
Note: learning_rate is just output of an Op, it can easily be decayed
Easily Add Optimizers
Builtin
● SGD, Adagrad, Momentum, Adam, …
Contributed
● LazyAdam, NAdam, YellowFin, ...
Putting all together to train a Neural
Net
Build a Graph by adding Operations:
● For Variables to hold the parameters of the Neural Net.
● To compute the Neural Net output: e.g. classification predictions.
● To compute a training loss: e.g. cross entropy, parameter L2 norms.
● To calculate gradients for the parameters to train.
● To apply gradients with a training function.
MNIST Example
tutorials/mnist/mnist.py
● Shows both training and evaluation
● Also shows InputLayers: Ops that
read data from files
Distributed Execution
Graph Execution
Session API
● Stubby based API to deploy a Graph in a Tensorflow runtime
● Can run any subset of the graph
● Can add Ops to an existing Graph (for interactive use in colab for example)
Training Utilities
● Checkpoint, Recovery, Summaries, Avisu, Replicas, etc.
Tensorflow Runtimes
● "local": In address space of the Python program.
● Remote: In servers typically running on Borg.
Local Runtime
Python Program
Session
create graph Runtime
create session
sess.Run() CPU
GPU
Remote Runtime
Master RunSubGraph()
Python Program CPU
()
raph Worker
create graph
ateG
s])
create session
([op
Cre
sess.Run() GetTensor()
Run
Session CPU CPU
Worker Worker
GPU GPU
Deploying Graph, Running Ops
# ...Add ops to the graph...
sess = session.Session("local") # Deploy graph
Running and fetching output
an op Fetch
# Run an Op and fetch its output.
# "values" is a numpy ndarray.
values = sess.Run(<an op output>)
Running and fetching output
an op Fetch
Transitive closure of needed ops is Run
Execution happens in parallel
Feeding input, Running, and Fetching
an op Fetch
Feed
a_val = ...a numpy ndarray...
values = sess.Run(<an op output>,
feed_input({<a output>: a_val})
Feeding input, Running, and Fetching
an op Fetch
Feed
Only the required Ops are run.
Higher-Level Core TF API
Training Utilities
Training program typically runs multiple threads
● Execute the training op in a loop.
● Checkpoint every so often.
● Gather summaries for the Visualizer.
● Other, eg. monitors Nans, costs, etc.
Training Coordinator, Training Threads
Helper objects to help multithreaded training
● Thread classes to execute training op, summaries, etc
● Coordinator to start/stop them together, manage summaries
Makes it easy to train single or multiple replicas
Example:
learning/brain/models/mnist/mnist_replicas.py
Layers are ops that create Variables
def embedding(x, vocab_size, dense_size,
name=None, reuse=None, multiplier=1.0):
"""Embed x of type int64 into dense vectors."""
with tf.variable_scope( # Use scopes like this.
name, default_name="emb", values=[x], reuse=reuse):
embedding_var = tf.get_variable(
"kernel", [vocab_size, dense_size])
return tf.gather(embedding_var, x)
Models are built from Layers
def bytenet(inputs, targets, hparams):
final_encoder = common_layers.residual_dilated_conv(
inputs, hparams.num_block_repeat, "SAME", "encoder", hparams)
shifted_targets = common_layers.shift_left(targets)
kernel = (hparams.kernel_height, hparams.kernel_width)
decoder_start = common_layers.conv_block(
tf.concat([final_encoder, shifted_targets], axis=3),
hparams.hidden_size, [((1, 1), kernel)], padding="LEFT")
return common_layers.residual_dilated_conv(
decoder_start, hparams.num_block_repeat,
"LEFT", "decoder", hparams)
Estimator
Estimator and Experiment
estimator = tf.contrib.learn.Estimator(
model_fn=model_builder(model_name, hparams=hparams),
model_dir=output_dir,
config=tf.contrib.learn.RunConfig(master=...))
experiment = tf.contrib.learn.Experiment(
estimator=estimator,
train_input_fn=f1, eval_input_fn=f2,
eval_metrics=eval_metrics, train_steps=train_steps,
eval_steps=eval_steps, train_monitors=train_monitors)
model_fn
def model_fn(features, targets, mode):
"""Creates the prediction, loss, and train ops.
Args:
features: A dictionary of tensor by feature name.
targets: A tensor representing the labels (targets).
mode: The execution mode, tf.contrib.learn.ModeKeys.
Returns:
A tuple: prediction, loss, and train_op.
input_fn
def input_fn():
"""Supplies input to our model.
This function supplies input to our model, where this input is a
function of the mode. For example, we supply different data if
we're performing training versus evaluation.
Returns:
A tuple consisting of 1) a dictionary of tensors whose keys are
the feature names, and 2) a tensor of target labels if the mode
is not INFER (and None, otherwise).
"""
High-Level Exteral Libraries:
Tensor2Tensor
Tensor2Tensor (github)
Define, train, and evaluate ML tasks and models (especially sequence tasks).
● Many datasets (WMT, MSCoco, LM1B, etc.) and models (Transformer,
ByteNet, NeuralGPU, LSTM) already built in - mix and match!
● Eminently extensible - add a new Problem, T2TModel, or Modality
● Easy distributed training, both sync and async (and with support for multiple
GPUs per machine)
● Easy hyperparameter tuning
Tensor2Tensor Organization
● data_generators/ : generators for datasets which subclass Problem
● models/ : layers and models, models must subclass T2TModel
● utils/ : utilities, t2t_model class, etc.
● google/ : internal stuff (organized in the same way)
● t2t_trainer.py: main binary called to train:
t2t-trainer
--data_dir=$DATA_DIR --problems=$PROBLEM --model=$MODEL \
--hparams_set=$HPARAMS --output_dir=$TRAIN_DIR
Tensor2Tensor Models
@registry.register_model
class ByteNet(t2t_model.T2TModel):
def model_fn_body(self, features):
return bytenet_internal(
features["inputs"], features["targets"], self._hparams)
Notes:
● T2TModels are registered in a registry, get name (byte_net)
● May implement model_fn_body, default model_fn.
Adding a Problem
@registry.register_problem("wmt_ende_tokens_8k")
class WMTEnDeTokens8k(WMTProblem):
"""Problem spec for WMT En-De translation."""
@property
def targeted_vocab_size(self):
return 2**13 # 8192
def train_generator(self, data_dir, tmp_dir, train):
yield {“inputs”: [1,2], “targets”: [3, 4]}
Let the Tensors Flow!