Sequence Learning
Problem
Unfolding Computational graph, RNN, Sequence Modeling
Conditioned on Context
Unfolding the Computational graph
❖ A computational graph is a way to formalize the structure of a set of
computations, such as those involved in mapping inputs and parameters to
outputs and loss.
❖ The idea of unfolding a recursive of recurrent computation into a
computational graph that has a repetitive structure, typically corresponding to
a chain of events. Unfolding this graph results in the sharing of parameters
across a deep network structure.
❖ For example consider the classical form of a dynamical system:
s(t) = f ( s(t-1); θ )
s(t) is called the state of the system / network at timestep i
Unfolding the Computational graph
❖ The above Equation is recurrent because the definition of s at time t refers back to
the same definition at time t - 1.
❖ For a finite number of time steps τ the graph can be unfolded by applying the
definition τ - 1 times. For example if we unfold equation for τ = 3 time steps, we
obtain
s(3) = f ( s(2); θ )
= f(f ( s(1); θ );θ)
❖ Unfolding the equation by repeatedly applying the definition in this way has yielded
an expression that does not involve recurrence. Such an expression can now be
represented by a traditional directed acyclic computational graph.
Unfolding Computational Graphs
Begins with initial specification of h(0) • Then for each time step from t=1 to t=τ we apply the
following update equations
a(t)=b+Wh(t-1)+Ux(t)
h(t)=tanh(a(t))
o(t)=c+Vh(t)
Where the parameters are:
• bias vectors b and c
• weight matrices U (input-to-hidden), V (hidden-to-output) and W (hidden-to-hidden)
connections
Recurrent Neural Network
Recurrent Neural Networks
❖ Recurrent Neural Networks (RNNs) are a class of neural networks designed
to recognize patterns in sequences of data, such as time series data, text, or
speech.
❖ Unlike traditional feedforward neural networks, RNNs have connections that
form directed cycles, which allows them to maintain an internal state, or
memory, that captures information about previous inputs in the sequence.
❖ This feature makes RNNs particularly powerful for tasks where the order or
sequence of data matters.
RNN
model.add(SimpleRNN(50, activation='relu', input_shape=(10, 5)))
model.add(SimpleRNN(10, activation='tanh', input_shape=(10, 5)))
from tensorflow.keras.layers import RNN, LSTMCell
from tensorflow.keras.models import Sequential
# Create an LSTMCell
lstm_cell = LSTMCell(50)
# Wrap the LSTMCell in an RNN layer
model = Sequential()
model.add(RNN(lstm_cell, input_shape=(timesteps, input_dim)))
model.compile(optimizer='adam', loss='mse')
RNN Design patterns Acceptor (Sequence Classifier)
An acceptor RNN takes a sequence of inputs and produces a single output, typically a
class label. This is useful for tasks such as sentiment analysis, where the entire
sequence (e.g., a sentence) needs to be classified.
Architecture:
● Input: Sequence of vectors [x1,x2,...,xT]
● Output: Single vector y (e.g., class label)
Applications
● Sentiment analysis
● Language identification
● Anomaly detection in time series data
Encoder (Sequence-to-Fixed)
An encoder RNN reads a sequence of inputs and produces a fixed-size vector
representation of the entire sequence. This is often used in tasks like encoding a
sentence for a neural machine translation system.
Architecture:
● Input: Sequence of vectors [x1,x2,...,xT]
● Output: Single vector hT(encoded representation)
Applications
● Sentence embedding for downstream tasks
● Context representation in conversation systems
Transducer (Sequence-to-Sequence)
A transducer RNN generates an output sequence from an input sequence. This design is used
in tasks like machine translation, where the input is a sentence in one language, and the output
is a sentence in another language. It typically involves two RNNs: an encoder and a decoder.
Architecture:
● Input: Sequence of vectors [x1,x2,...,xT]
● Output: Sequence of vectors [y1,y2,...,yT′]
Applications
● Neural machine translation
● Speech-to-text systems
● Video captioning
Backpropagation through time
Sequence Modeling Conditioned on Contexts
Sequence Modeling Conditioned on Contexts refers to a modeling approach
where the generation or prediction of sequences (e.g., words, characters, or other
types of data) is influenced by external information, known as the context.
This external context can come from a variety of sources, such as prior text,
metadata, or another related sequence.
Sequence Modeling vs Conditioned on Context
Sequence Modeling:
In general, sequence modeling involves training a model to predict or generate the
next element in a sequence based on previous elements. A classic example of
this is language modeling, where the task is to predict the next word in a sentence
given the previous words.
Conditioned on Contexts:
When the model is conditioned on a context, it means that the prediction of each
element in the sequence is influenced not only by the preceding elements in
the sequence but also by additional information provided as context.
Conditioned on output of the previous step
The structure of the Conditional
RNN contains connections from
the previous output yt−1 to the
current state ht. This means that
when predicting yt, the model
considers not only the current input
xt but also the previously predicted
output yt−1.
Extra input to an RNN
Some common ways of providing an extra input to an RNN are:
1. as an extra input at each time step, or
2. as the initial state h(0), or
3. both.