X
karthikrajmr.aids2023@citchennai.net
(https://swayam.gov.in)
(https://swayam.gov.in/nc_details/NPTEL)
NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)
Click to register
for Certification
exam
Week 4: Assignment 4
(https://examform.nptel.ac.in/2025_10/exam_form/dashboard)
Your last recorded submission was on 2025-08-19, 11:44 IST Due date: 2025-08-20, 23:59 IST.
If already 1) State True/False 1 point
registered, click The state transition graph for any MDP is a directed acyclic graph.
to check your
payment status True
False
2) Consider the following statements: 1 point
Course outline (i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗ ), without
About NPTEL accessing the MDP parameters.
() (iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q ∗ ),
without accessing the MDP parameters.
How does an
NPTEL online Which of these statements are false?
course work?
() Only (ii)
Only (iii)
Week 0 ()
Only (i), (ii)
Only (i), (iii)
Week 1 ()
Only (ii), (iii)
Week 2 ()
3) Which of the following statements are true for a finite MDP? (Select all that apply). 1 point
Week 3 ()
The Bellman equation of a value function of a finite MDP defines a contraction in Banach space
(using the max norm).
Week 4 ()
MDP Modelling If 0 ≤ γ < 1 , then the eigenvalues of γPπ are less than 1 .
(unit? We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
unit=42&lesson=
43) The sequence defined by vn = rπ + γP π vn−1 is a Cauchy sequence in Banach space (using the
Bellman
max norm).
Equation (unit? (Pπ is a stochastic matrix)
unit=42&lesson= 4) Which of the following is a benefit of using RL algorithms for solving MDPs? 1 point
44)
They do not require the state of the agent for solving a MDP.
Bellman
Optimality They do not require the action taken by the agent for solving a MDP.
Equation (unit? They do not require the state transition probability matrix for solving a MDP.
unit=42&lesson=
They do not require the reward signal for solving a MDP.
45)
Cauchy 5) Consider the following equations: 1 point
Sequence and
Green's (i) vπ (s) ∞ i−t
= Eπ [∑ γ Ri+1 |St = s]
i=t
Equation (unit?
(ii) q π
(s, a) = ∑ ′
′
p(s |s, a)v
π ′
(s )
unit=42&lesson= s
(iii) vπ (s) = ∑ π(a|s)q
π
(s, a)
46) a
Banach Fixed Which of the above are correct?
Point Theorem
(unit? Only (i)
unit=42&lesson=
Only (i), (ii)
47)
Only (ii), (iii)
Convergence
Only (i), (iii)
Proof (unit?
unit=42&lesson= (i), (ii), (iii)
48)
6) What is true about the γ (discount factor) in reinforcement learning? 1 point
Week 4
Feedback Form :
Discount factor can be any real number
Reinforcement
Learning (unit?
The value of γ cannot affect the optimal policy
unit=42&lesson=
237) The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards
that it receives over a shorter horizon
Practice: Week 4
: Assignment
7) Consider the following statements for a finite MDP (I is an identity matrix with dimensions 1 point
4(Non Graded)
(assessment? |S| × |S|(S is the set of all states) and Pπ is a stochastic matrix):
name=288) (i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
Quiz: Week 4:
(iii) If 0 ≤ γ < 1 , then rank of the matrix I − γP π is equal to |S| .
Assignment 4
(iv) If 0 ≤ γ < 1 , then rank of the matrix I − γP π is less than |S| .
(assessment?
name=289)
Which of the above statements are true?
Week 5 () Only (ii), (iii)
Only (ii), (iv)
DOWNLOAD Only (i), (iii)
VIDEOS () Only (i), (ii), (iii)
NPTEL
8) Consider an MDP with 3 states A, B, C . At each state we can go to either of the two states. 1 point
Resources ()
i.e if we are in state A then we can perform 2 actions, going to state B or C . The rewards for each
transactions are r(A, B) = −3 (reward if we go from A to B), r(B, A) ,
= −1 r(B, C ) = 8 ,
r(C , B) = 4 , r(A, C ) = 0 , r(C , A) = 5 , discount factor is 0.9. Find the fixed point of the value
function for the policy π(A) = B (if we are in state A we choose the action to go to B)
π(B) = C , π(C ) = A. v
π
([ABC ]) =? (round to 1 decimal place)
[20.6, 21.8, 17.6]
[30.4, 44.2, 32.4]
[30.4, 37.2, 32.4]
[21.6, 21.8, 17.6]
9) Which of the following is not a valid norm function? (x is a D dimensional vector) 1 point
maxd∈{1,…,D} |x d |
−−−−−−
D 2
√Σ x
d=1 d
mind∈{1,…,D} |x d |
D
Σ |x d |
d=1
10) For an operator L, which of the following properties must be satisfied by x for it to be a fixed 1 point
point for L?(Multi-Correct)
Lx = x
2
L x = x
∀λ > 0Lx = λx
None of the above
You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers