KEMBAR78
Reinforcement Learning - Unit 7 - Week 4 | PDF | Function (Mathematics) | Mathematical Concepts
0% found this document useful (0 votes)
18 views3 pages

Reinforcement Learning - Unit 7 - Week 4

The document outlines the details of an assignment for a Reinforcement Learning course offered through NPTEL, including submission deadlines and various questions related to Markov Decision Processes (MDPs). It includes multiple-choice questions on concepts such as state transition graphs, optimal policies, and the properties of finite MDPs. Additionally, it provides information on the course structure and links for registration and certification exam details.

Uploaded by

gamingcoding87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views3 pages

Reinforcement Learning - Unit 7 - Week 4

The document outlines the details of an assignment for a Reinforcement Learning course offered through NPTEL, including submission deadlines and various questions related to Markov Decision Processes (MDPs). It includes multiple-choice questions on concepts such as state transition graphs, optimal policies, and the properties of finite MDPs. Additionally, it provides information on the course structure and links for registration and certification exam details.

Uploaded by

gamingcoding87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

X

karthikrajmr.aids2023@citchennai.net 

(https://swayam.gov.in)

(https://swayam.gov.in/nc_details/NPTEL)

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)


Click to register
for Certification
exam
Week 4: Assignment 4
(https://examform.nptel.ac.in/2025_10/exam_form/dashboard)
Your last recorded submission was on 2025-08-19, 11:44 IST Due date: 2025-08-20, 23:59 IST.

If already 1) State True/False 1 point


registered, click The state transition graph for any MDP is a directed acyclic graph.
to check your
payment status True
False

2) Consider the following statements: 1 point


Course outline (i) The optimal policy of an MDP is unique.
(ii) We can determine an optimal policy for a MDP using only the optimal value function(v∗ ), without
About NPTEL accessing the MDP parameters.
() (iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(q ∗ ),
without accessing the MDP parameters.
How does an
NPTEL online Which of these statements are false?
course work?
() Only (ii)
Only (iii)
Week 0 ()
Only (i), (ii)
Only (i), (iii)
Week 1 ()
Only (ii), (iii)
Week 2 ()
3) Which of the following statements are true for a finite MDP? (Select all that apply). 1 point
Week 3 ()
The Bellman equation of a value function of a finite MDP defines a contraction in Banach space
(using the max norm).
Week 4 ()

MDP Modelling If 0 ≤ γ < 1 , then the eigenvalues of γPπ are less than 1 .
(unit? We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
unit=42&lesson=
43) The sequence defined by vn = rπ + γP π vn−1 is a Cauchy sequence in Banach space (using the
Bellman
max norm).
Equation (unit? (Pπ is a stochastic matrix)
unit=42&lesson= 4) Which of the following is a benefit of using RL algorithms for solving MDPs? 1 point
44)
They do not require the state of the agent for solving a MDP.
Bellman
Optimality They do not require the action taken by the agent for solving a MDP.
Equation (unit? They do not require the state transition probability matrix for solving a MDP.
unit=42&lesson=
They do not require the reward signal for solving a MDP.
45)

Cauchy 5) Consider the following equations: 1 point


Sequence and
Green's (i) vπ (s) ∞ i−t
= Eπ [∑ γ Ri+1 |St = s]
i=t
Equation (unit?
(ii) q π
(s, a) = ∑ ′

p(s |s, a)v
π ′
(s )
unit=42&lesson= s

(iii) vπ (s) = ∑ π(a|s)q


π
(s, a)
46) a

Banach Fixed Which of the above are correct?


Point Theorem
(unit? Only (i)
unit=42&lesson=
Only (i), (ii)
47)
Only (ii), (iii)
Convergence
Only (i), (iii)
Proof (unit?
unit=42&lesson= (i), (ii), (iii)
48)
6) What is true about the γ (discount factor) in reinforcement learning? 1 point
Week 4
Feedback Form :
Discount factor can be any real number
Reinforcement
Learning (unit?
The value of γ cannot affect the optimal policy
unit=42&lesson=
237) The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards
that it receives over a shorter horizon
Practice: Week 4
: Assignment
7) Consider the following statements for a finite MDP (I is an identity matrix with dimensions 1 point
4(Non Graded)
(assessment? |S| × |S|(S is the set of all states) and Pπ is a stochastic matrix):
name=288) (i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
Quiz: Week 4:
(iii) If 0 ≤ γ < 1 , then rank of the matrix I − γP π is equal to |S| .
Assignment 4
(iv) If 0 ≤ γ < 1 , then rank of the matrix I − γP π is less than |S| .
(assessment?
name=289)
Which of the above statements are true?

Week 5 () Only (ii), (iii)


Only (ii), (iv)
DOWNLOAD Only (i), (iii)
VIDEOS () Only (i), (ii), (iii)

NPTEL
8) Consider an MDP with 3 states A, B, C . At each state we can go to either of the two states. 1 point
Resources ()
i.e if we are in state A then we can perform 2 actions, going to state B or C . The rewards for each
transactions are r(A, B) = −3 (reward if we go from A to B), r(B, A) ,
= −1 r(B, C ) = 8 ,
r(C , B) = 4 , r(A, C ) = 0 , r(C , A) = 5 , discount factor is 0.9. Find the fixed point of the value
function for the policy π(A) = B (if we are in state A we choose the action to go to B)
π(B) = C , π(C ) = A. v
π
([ABC ]) =? (round to 1 decimal place)

[20.6, 21.8, 17.6]


[30.4, 44.2, 32.4]
[30.4, 37.2, 32.4]
[21.6, 21.8, 17.6]
9) Which of the following is not a valid norm function? (x is a D dimensional vector) 1 point

maxd∈{1,…,D} |x d |

−−−−−−
D 2
√Σ x
d=1 d

mind∈{1,…,D} |x d |

D
Σ |x d |
d=1

10) For an operator L, which of the following properties must be satisfied by x for it to be a fixed 1 point
point for L?(Multi-Correct)

Lx = x

2
L x = x

∀λ > 0Lx = λx

None of the above

You may submit any number of times before the due date. The final submission will be considered for
grading.
Submit Answers

You might also like