KEMBAR78
Reinforcement Learning - Unit 7 - Week 4 | PDF | Function (Mathematics) | Functions And Mappings
0% found this document useful (0 votes)
20 views2 pages

Reinforcement Learning - Unit 7 - Week 4

The document outlines the details of an NPTEL Reinforcement Learning course, including assignments, announcements, and course structure. It provides specific questions related to Markov Decision Processes (MDPs) and reinforcement learning concepts for students to answer as part of their coursework. The document also includes submission guidelines and deadlines for assignments and quizzes.

Uploaded by

Jatan Tandon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

Reinforcement Learning - Unit 7 - Week 4

The document outlines the details of an NPTEL Reinforcement Learning course, including assignments, announcements, and course structure. It provides specific questions related to Markov Decision Processes (MDPs) and reinforcement learning concepts for students to answer as part of their coursework. The document also includes submission guidelines and deadlines for assignments and quizzes.

Uploaded by

Jatan Tandon
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

X

(https://swayam.gov.in) abhishekkanade9102@gmail.com

(https://swayam.gov.in/nc_details/NPTEL)

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)

Announcements (announcements) About the Course (preview) Q&A (forum) Progress (student/home) Mentor (student/mentor)

Review Assignment (assignment_review) Course Recommendations (/course_recommendations)


Click to register for
Certification exam

Week 4: Assignment 4
(https://examform.nptel.ac.in/2025_10/exam_form/dashboard)

Your last recorded submission was on 2025-08-19, 23:22 IST Due date: 2025-08-20, 23:59 IST.
If already registered, click
to check your payment 1) State True/False 1 point
status The state transition graph for any MDP is a directed acyclic graph.

True
False
Course outline 2) Consider the following statements: 1 point
(i) The optimal policy of an MDP is unique.
About NPTEL () (ii) We can determine an optimal policy for a MDP using only the optimal value function(𝑣 ∗ ), without accessing the MDP parameters.
(iii) We can determine an optimal policy for a given MDP using only the optimal q-value function(𝑞 ∗ ), without accessing the MDP parameters.
How does an NPTEL
online course work? () Which of these statements are false?

Only (ii)
Week 0 ()
Only (iii)
Only (i), (ii)
Week 1 ()
Only (i), (iii)
Only (ii), (iii)
Week 2 ()
3) Which of the following statements are true for a finite MDP? (Select all that apply). 1 point
Week 3 ()
The Bellman equation of a value function of a finite MDP defines a contraction in Banach space (using the max norm).
Week 4 ()
If 0 ≤ γ < 1 , then the eigenvalues of γ𝑃π are less than 1 .
MDP Modelling (unit? We call a normed vector space ’complete’ if Cauchy sequences exist in that vector space.
unit=42&lesson=43)
The sequence defined by 𝑣 𝑛 = 𝑟π + γ𝑃π 𝑣 𝑛−1 is a Cauchy sequence in Banach space (using the max norm).
Bellman Equation (unit?
(𝑃π is a stochastic matrix)
unit=42&lesson=44)
4) Which of the following is a benefit of using RL algorithms for solving MDPs? 1 point
Bellman Optimality
Equation (unit? They do not require the state of the agent for solving a MDP.
unit=42&lesson=45)
They do not require the action taken by the agent for solving a MDP.
Cauchy Sequence and They do not require the state transition probability matrix for solving a MDP.
Green's Equation (unit? They do not require the reward signal for solving a MDP.
unit=42&lesson=46)
5) Consider the following equations: 1 point
Banach Fixed Point
Theorem (unit?
unit=42&lesson=47)
(i) 𝑣 π (𝑠) = 𝔼π [∑∞ γ𝑖−𝑡 𝑅𝑖+1 |𝑆𝑡 = 𝑠]
𝑖=𝑡
(ii) 𝑞 π (𝑠, 𝑎) = ∑𝑠′ 𝑝(𝑠 ′ |𝑠, 𝑎)𝑣 π (𝑠 ′ )
Convergence Proof (unit? (iii) 𝑣 π (𝑠) = ∑𝑎 π(𝑎|𝑠)𝑞 π (𝑠, 𝑎)
unit=42&lesson=48)

Week 4 Feedback Form : Which of the above are correct?


Reinforcement Learning
(unit?unit=42&lesson=237)
Only (i)
Only (i), (ii)
Practice: Week 4 : Only (ii), (iii)
Assignment 4(Non Graded)
Only (i), (iii)
(assessment?name=288)
(i), (ii), (iii)
Quiz: Week 4:
Assignment 4 6) What is true about the γ (discount factor) in reinforcement learning? 1 point
(assessment?name=289)
Discount factor can be any real number
Week 5 ()
The value of γ cannot affect the optimal policy
The lower the value of gamma, the more myopic the agent gets, i.e the agent maximises rewards that it receives over a shorter horizon
DOWNLOAD VIDEOS ()
7) Consider the following statements for a finite MDP (𝐼 is an identity matrix with dimensions |𝑆| × |𝑆|(𝑆 is the set of all states) and 𝑃π 1 point
NPTEL Resources () is a stochastic matrix):
(i) MDP with stochastic rewards may not have a deterministic optimal policy.
(ii) There can be multiple optimal stochastic policies.
(iii) If 0 ≤ γ < 1 , then rank of the matrix 𝐼 − γ𝑃π is equal to |𝑆| .
(iv) If 0 ≤ γ < 1 , then rank of the matrix 𝐼 − γ𝑃π is less than |𝑆| .
Which of the above statements are true?

Only (ii), (iii)


Only (ii), (iv)
Only (i), (iii)
Only (i), (ii), (iii)

8) Consider an MDP with 3 states 𝐴, 𝐵, 𝐶 . At each state we can go to either of the two states. i.e if we are in state 𝐴 then we can 1 point
perform 2 actions, going to state 𝐵 or 𝐶 . The rewards for each transactions are 𝑟(𝐴, 𝐵) = −3 (reward if we go from 𝐴 to 𝐵 ), 𝑟(𝐵, 𝐴) = −1 ,
𝑟(𝐵, 𝐶) = 8 , 𝑟(𝐶, 𝐵) = 4 , 𝑟(𝐴, 𝐶) = 0, 𝑟(𝐶, 𝐴) = 5, discount factor is 0.9 . Find the fixed point of the value function for the policy π(𝐴) = 𝐵 (if
we are in state 𝐴 we choose the action to go to 𝐵 ) π(𝐵) = 𝐶, π(𝐶) = 𝐴. 𝑣 π ([𝐴𝐵𝐶]) =? (round to 1 decimal place)

[20.6, 21.8, 17.6]


[30.4, 44.2, 32.4]
[30.4, 37.2, 32.4]
[21.6, 21.8, 17.6]

9) Which of the following is not a valid norm function? (𝑥 is a 𝐷 dimensional vector) 1 point

max𝑑∈{1,…,𝐷} |𝑥𝑑 |

‾‾‾‾‾‾‾2
√Σ𝑑=1 𝑥𝑑
𝐷

min 𝑑∈{1,…,𝐷} |𝑥𝑑 |

Σ𝐷
𝑑=1 |𝑥𝑑 |

10) For an operator 𝐿 , which of the following properties must be satisfied by 𝑥 for it to be a fixed point for 𝐿 ?(Multi-Correct) 1 point

𝐿𝑥 = 𝑥

𝐿2 𝑥 = 𝑥
∀λ > 0𝐿𝑥 = λ𝑥
None of the above

You may submit any number of times before the due date. The final submission will be considered for grading.
Submit Answers

You might also like