2/18/25, 5:24 PM Introduction to Large Language Models (LLMs) - - Unit 7 - Week 5
Assessment submitted.
(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)
X
g.monish2022@gmail.com
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Introduction to Large Language Models (LLMs)
(course)
Click to register for Certification exam
(https://examform.nptel.ac.in/2025_01/exam_form/dashboard)
If already registered, click to check your payment status
Course outline
About NPTEL ()
How does an NPTEL online course work? ()
Week 1 ()
Week 2 ()
Week 3 ()
Week 4 ()
Week 5 ()
Lec 10 : Neural Language Models: CNN & RNN (unit?unit=43&lesson=44)
Lec 11 : Neural Language Models: LSTM & GRU (unit?unit=43&lesson=45)
Lec 12 : Sequence-to-Sequence Models (unit?unit=43&lesson=46)
Lec 13 : Decoding Strategies (unit?unit=43&lesson=47)
Lec 14 : Attention in Sequence-to-Sequence Models (unit?unit=43&lesson=48)
Feedback Form (unit?unit=43&lesson=49)
Quiz: Week 5 : Assignment 5 (assessment?name=50)
https://onlinecourses.nptel.ac.in/noc25_cs45/unit?unit=43&assessment=50 1/4
2/18/25, 5:24 PM Introduction to Large Language Models (LLMs) - - Unit 7 - Week 5
Yearsubmitted.
Assessment 2025 Solutions ()
X
Thank you for taking the Week 5 :
Assignment 5.
Week 5 : Assignment 5
Your last recorded submission was on 2025-02-18, 17:24 IST Due date: 2025-02-26, 23:59 IST.
1) Which of the following is a disadvantage of Recurrent Neural Networks (RNNs)? 1 point
Can only process fixed-length inputs.
Symmetry in how inputs are processed.
Difficulty accessing information from many steps back.
Weights are not reused across timesteps.
2) Why are RNNs preferred over fixed-window neural models? 1 point
They have a smaller parameter size.
They can process sequences of arbitrary length.
They eliminate the need for embedding layers.
None of the above.
3) What is the primary purpose of the cell state in an LSTM? 1 point
Store short-term information.
Control the gradient flow across timesteps.
Store long-term information.
Perform the activation function.
4) In training an RNN, what technique is used to calculate gradients over multiple 1 point
timesteps?
Backpropagation through Time (BPTT)
Stochastic Gradient Descent (SGD)
Dropout Regularization
Layer Normalization
https://onlinecourses.nptel.ac.in/noc25_cs45/unit?unit=43&assessment=50 2/4
2/18/25, 5:24 PM Introduction to Large Language Models (LLMs) - - Unit 7 - Week 5
5) Consider a simple RNN: 2 points
Assessment submitted.
X ● Input vector size: 3
● Hidden state size: 4
● Output vector size: 2
● Number of timesteps: 5
How many parameters are there in total, including the bias terms?
210
190
90
42
6) What is the time complexity for processing a sequence of length 'N' by an RNN, if the 1 point
input embedding dimension, hidden state dimension, and output vector dimension are all 'd'?
O(N)
O(N²d)
O(Nd)
O(Nd²)
7) Which of the following is true about Seq2Seq models? 1 point
(i) Seq2Seq models are always conditioned on the source sentence.
(ii) The encoder compresses the input sequence into a fixed-size vector representation.
(iii) Seq2Seq models cannot handle variable-length sequences.
(i) and (ii)
(ii) only
(iii) only
(i), (ii), and (iii)
8) Given the following encoder and decoder hidden states, compute the attention 2 points
scores. (Use dot product as the scoring function)
Encoder hidden states: h1 = [1,2], h2 = [3,4], h3 = [5,6]
Decoder hidden state: s = [0.5,1]
0.00235,0.04731,0.9503
0.0737,0.287,0.6393
0.9503,0.0137,0.036
0.6393,0.0737,0.287
You may submit any number of times before the due date. The final submission will be considered
for grading.
Submit Answers
https://onlinecourses.nptel.ac.in/noc25_cs45/unit?unit=43&assessment=50 3/4
2/18/25, 5:24 PM Introduction to Large Language Models (LLMs) - - Unit 7 - Week 5
Assessment submitted.
X
https://onlinecourses.nptel.ac.in/noc25_cs45/unit?unit=43&assessment=50 4/4