KEMBAR78
Lecture Notes - RRN | PDF | Artificial Neural Network | Computational Science
0% found this document useful (0 votes)
29 views8 pages

Lecture Notes - RRN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views8 pages

Lecture Notes - RRN

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture Notes: Recurrent Neural Networks (RNNs)

1. Introduction to Recurrent Neural Networks (RNNs)

What are RNNs?

A class of artificial neural networks designed to model sequential data.

Unlike traditional feedforward neural networks, RNNs have connections that form cycles, allowing
information to persist. This enables RNNs to capture temporal dependencies and context in
sequential data.

Applications of RNNs :

Natural Language Processing (NLP): Text generation, sentiment analysis, machine translation.

Time Series Prediction: Stock prices, weather forecasting.

Speech Recognition: Converting audio to text.

Video Analysis: Action recognition, scene segmentation.

2. Basic Architecture of RNNs

Key Characteristics :

Sequential Data : RNNs process data in sequences (e.g., time steps in a time series, words in a
sentence).

Hidden State : The core feature of RNNs is the hidden state \( h_t \), which carries information
from one time step to the next, allowing the network to "remember" information from previous
time steps.

Mathematical Formulation :
Graphical Representation :

Each time step t the same weights, forming a cycle in the network structure. This is what gives
RNNs their "recurrent" property.

3. Challenges with Standard RNNs

While RNNs are powerful for sequential data, they come with several challenges:

1. Vanishing Gradient Problem :

During backpropagation through time (BPTT), gradients can shrink exponentially as they are
propagated back through many layers or time steps, making it difficult for the network to learn
longrange dependencies.

2. Exploding Gradient Problem :

Conversely, gradients can also grow exponentially, leading to instability in training and making
optimization difficult.
3. Limited Memory :

Basic RNNs struggle to capture longterm dependencies, as information from earlier time steps
gets "forgotten" quickly as the network processes new inputs.

4. Solutions to RNN Challenges

Long ShortTerm Memory (LSTM) :

A special kind of RNN designed to address the vanishing gradient problem and improve the
network's ability to learn longrange dependencies.

LSTM Components :

Cell state : Maintains longterm memory.

Forget gate : Decides which information to discard from the cell state.

Input gate : Decides which new information to add to the cell state.

Output gate : Controls what part of the cell state is output as the hidden state.

The LSTM cell uses these gates to regulate the flow of information, mitigating issues such as
vanishing gradients and enabling better memory retention.

Gated Recurrent Units (GRUs) :

A simpler variant of LSTMs with fewer gates (no separate cell state).

GRU Components :

Update gate : Decides how much of the previous hidden state should be carried forward.

Reset gate : Decides how much of the previous hidden state to forget.

GRUs tend to perform similarly to LSTMs but with less computational overhead.

5. Backpropagation Through Time (BPTT)

To train RNNs, we use Backpropagation Through Time (BPTT), an extension of the


backpropagation algorithm that handles the recurrent nature of RNNs:
1. Forward pass : Compute the hidden state ht and output Yt for each time step based on the
input sequence.

2. Compute loss : At each time step, compute the loss based on the predicted output and the
actual target.

3. Backward pass : Calculate the gradients of the loss with respect to the weights by unrolling the
RNN over time and applying the chain rule. This involves:

Gradients for each time step \( t \) are computed and accumulated.

These gradients are then propagated backward through time.

6. Variants of RNNs

Bidirectional RNNs (BiRNNs) :

These networks process the sequence in both forward and backward directions, allowing them to
capture context from both past and future time steps.

Forward pass : Processes the sequence from start to end.

Backward pass : Processes the sequence from end to start.

The final hidden state is a combination of both.

Deep RNNs :

Stacking multiple layers of RNNs can increase the representational power of the network.

This approach is useful when learning more complex temporal patterns.

7. Training RNNs

Optimization :

RNNs are typically trained using gradientbased optimization algorithms (e.g., SGD, Adam).

Special attention is needed to handle issues like vanishing/exploding gradients, often through
initialization schemes or using LSTM/GRU cells.
Regularization :

Dropout : Can be applied to RNNs to prevent overfitting.

Gradient clipping : Used to handle exploding gradients by capping the gradients during training.

8. Advanced Topics

Attention Mechanism :

Attention allows the network to focus on important parts of the input sequence, enabling it to
handle longrange dependencies better than vanilla RNNs.

Popularized in NLP tasks like machine translation (e.g., Transformer models).

Transformers :

A modern architecture that, while inspired by RNNs, uses attention mechanisms in place of
recurrence to achieve better performance and parallelization, especially for long sequences.

9. Example: Language Modeling with RNN

Problem : Given a sequence of words, predict the next word in the sequence.

1. Input : A sequence of words (e.g., "I am going to the").

2. RNN Process :

Each word is embedded into a vector (e.g., using Word2Vec or GloVe).

The RNN processes the sequence one word at a time, updating its hidden state at each step.

3. Output : At each time step, the RNN generates a probability distribution over the vocabulary for
the next word. The word with the highest probability is chosen as the output.
Traditional Neural Networks (TNNs) and Recurrent Neural Networks (RNNs) are both types of
artificial neural networks, but they differ in how they process information and the types of tasks they
are suited for. Below are the key differences:

1. Architecture:

Traditional Neural Networks (TNNs):

These are feedforward networks, meaning that the information moves in one direction—from
the input layer, through hidden layers, to the output layer.

There are no cycles or loops in the network. Each layer's output only depends on the current
input and weights, and there is no memory of previous inputs.

Example: Multilayer Perceptron (MLP), Convolutional Neural Networks (CNNs).

Recurrent Neural Networks (RNNs):

RNNs have recurrent connections, meaning that the output from the previous time step is fed
back into the network, allowing the network to maintain a form of "memory."

The hidden state of the network can capture temporal dependencies, meaning the model can
take into account past inputs when producing outputs.

RNNs are designed to process sequential data and are often used in tasks like language modeling,
speech recognition, and time series analysis.

2. Memory:

TNNs:

Traditional neural networks do not have memory. Each input is processed independently, and the
network does not retain any information about past inputs once it moves on to the next one.

RNNs:

RNNs have an inherent memory mechanism. The hidden state of the network at a given time step
is influenced by both the current input and the previous hidden state, allowing the model to
remember information from previous time steps.

3. Use Cases:

TNNs:

Best suited for problems where the relationship between inputs and outputs does not depend on
sequential or temporal context.
Examples: Image classification, object recognition, simple regression tasks, and pattern
recognition where inputs are independent.

RNNs:

Ideal for sequential data or problems where timedependent patterns need to be learned. They
excel at tasks where the output depends on previous inputs.

Examples: Natural Language Processing (NLP), machine translation, speech recognition, and time
series forecasting.

4. Data Input Type:

TNNs:

Typically process fixed size input where each sample is independent of the others.

Inputs are not temporal or sequential, e.g., an image, a vector, etc.

RNNs:

Designed to handle variable length sequences where the input is time or order dependent.

The input at each time step depends not only on the current input but also on the previous
inputs or states, making them suitable for sequential tasks.

5. Training Difficulty:

TNNs:

Training is generally easier because there are no dependencies across time steps, and the
backpropagation algorithm works straightforwardly.

RNNs:

RNNs are more difficult to train because they involve dependencies across time steps. The
backpropagation through time (BPTT) algorithm is used, which can suffer from issues like the
vanishing gradient problem and exploding gradients, making learning more challenging.

Variants like Long Short Term Memory (LSTM) networks and Gated Recurrent Units (GRUs) were
developed to address these challenges by providing better memory management.

6. Output:

TNNs:
Typically produce a single output based on the entire input. For example, in image classification,
the output is a label assigned to the image.

RNNs:

Can produce a sequence of outputs for each time step (as in the case of sequence to sequence
models) or a single output after processing an entire sequence (e.g., in sequence classification tasks).

7. Parameter Sharing:

TNNs:

Each layer in a traditional neural network has its own set of weights for every connection. These
weights do not share information across different inputs or time steps.

RNNs:

RNNs share weights across time steps. This weight sharing is what allows RNNs to generalize over
sequences of different lengths. The same weights are applied to each time step in the sequence,
which is one of the key reasons they are suited for sequential data.

Summary of Differences:

Traditional Neural Networks


Feature Recurrent Neural Networks (RNNs)
(TNNs)
Architecture Feedforward (no cycles) Recurrent (with loops)
Memory of past inputs through hidden
Memory No memory of past inputs
states
Sequential Data Not suited for sequential data Specifically designed for sequential data
Use Cases Classification, Regression, etc. Time series, NLP, speech recognition
Easier (standard More difficult (backpropagation through
Training
backpropagation) time)
Input Type Fixed-size, independent inputs Variable-length, time-dependent inputs
Sequence of outputs or single output
Output Single output (e.g., label)
depending on task
Parameter
No weight sharing Weight sharing across time steps
Sharing

You might also like