KEMBAR78
Reinforcement Learning - Week 12 | PDF | Probability Theory | Applied Mathematics
0% found this document useful (0 votes)
578 views3 pages

Reinforcement Learning - Week 12

The document outlines the Week 12 assignment for a Reinforcement Learning course offered by NPTEL, including various questions related to probabilities in different states and observations. It provides instructions for submitting answers and details about the course structure. The assignment includes multiple-choice questions that require understanding of concepts like POMDP and Markov systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
578 views3 pages

Reinforcement Learning - Week 12

The document outlines the Week 12 assignment for a Reinforcement Learning course offered by NPTEL, including various questions related to probabilities in different states and observations. It provides instructions for submitting answers and details about the course structure. The assignment includes multiple-choice questions that require understanding of concepts like POMDP and Markov systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

4/16/25, 9:51 PM Reinforcement Learning - - Unit 15 - Week 12

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

chandu.podila@gmail.com 

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Reinforcement Learning (course)


Click to register
for Certification
exam
Week 12 : Assignment 12
(https://examform.nptel.ac.in/2025_01/exam_form/dashboard)
Your last recorded submission was on 2025-04-16, 13:55 Due date: 2025-04-16, 23:59 IST.
IST
If already
Instructions: In the following questions, one or more choices may be correct. Select all that apply.
registered, click
to check your 1) Consider an environment in which an agent is randomly dropped into either state s1 1 point
payment status or s2 initially with equal probability. The agent can only view obstacles present immediately to
the North, South, East or West. However the observation made in each direction by the agent
may be wrong with a probability of 0.1. If in state s1 obstacles are present to the North and
South, and in s2 obstacles are present to the East and West, what is the probability of the agent
Course being in state s1 if the observation made is that there are obstacles present to the North, South
outline and West.

About 81/82
NPTEL () 41/82
73/82
How does an
None of the above.
NPTEL
online
course 2) In the same environment as Question 1, suppose state s1 has obstacles present 1 point
work? () only to the North and South, and s2 has obstacles present only to the East and West. What is
the probability of the agent being in state s1 if the observation made is that there are obstacles
Week 0 () present only to the North and East.

81/82
Week 1 ()
41/82
Week 2 () 73/82
None of the above.
Week 3 ()
3) Assertion: One of the reasons history based methods are not feasible in certain 1 point
Week 4 () scenarios is the significant increase in state space when trajectory lengths are large.
Reason: The number of states increases polynomially w.r.t. trajectory length.
Week 5 ()

https://onlinecourses.nptel.ac.in/noc25_cs62/unit?unit=109&assessment=280 1/3
4/16/25, 9:51 PM Reinforcement Learning - - Unit 15 - Week 12

Week 6 () Both Assertion and Reason are true, and Reason is correct explanation for Assertion.
Both Assertion and Reason are true, but Reason is not correct explanation for assertion
Week 7 () Assertion is true, Reason is false
Both Assertion and Reason are false
Week 8 ()
4) Suppose that we solve a POMDP using a Q-MDP like solution discussed in the 1 point
Week 9 () lectures - where we assume that the MDP is known and solve it to learn Q values for the true
(state, action) pairs. Which of the following are true?
Week 10 ()
We can recover a policy for execution in the partially observable environment by weighting
Week 11 () Q values by the belief distribution bel so that π(s) = argmax a ∑ bel(s)Q(s, a).
s

Week 12 () We can recover an optimal policy for the POMDP from the Q values that have been learnt
for the true (state, action) pairs.
POMDP
Introduction
Policies recovered from Q-MDP like solution methods are always better than policies
(unit? learnt by history based methods.
unit=109&less None of the above
on=110)

Solving 5) Consider the grid-world shown below: 1 point


POMDP (unit?
unit=109&less
on=111)

Week 12
Feedback
Form :
Reinforcement
Learning (unit?
unit=109&less
on=245)

Quiz: Week
12 :
Assignment
12
(assessment?
Walls and obstacles are colored gray. The agent is equipped with a sensor that can detect the
name=280)
presence of walls or obstacles immediately to its North, South, East or West.
DOWNLOAD
VIDEOS () Which of the following are true if we represent states by their sensor observations?

Text The grid-world is a 1st-order Markov system.


Transcripts The grid-world is a 2nd-order Markov system.
() The grid-world is a 3rd-order Markov system.
The grid-world is a 4th-order Markov system.

You may submit any number of times before the due date. The final submission will be
considered for grading.
Submit Answers

https://onlinecourses.nptel.ac.in/noc25_cs62/unit?unit=109&assessment=280 2/3
4/16/25, 9:51 PM Reinforcement Learning - - Unit 15 - Week 12

https://onlinecourses.nptel.ac.in/noc25_cs62/unit?unit=109&assessment=280 3/3

You might also like