0% found this document useful (0 votes)

43 views12 pages

Hota ML ReinforcementLearning

The document provides an introduction to Reinforcement Learning (RL) and its applications, including gaming and robotics. It explains the formal modeling of RL using Markov Decision Processes (MDP) and details the Q-Learning algorithm for learning optimal policies. Additionally, it discusses the use of Deep Q-Learning (DQN) to handle large state and action spaces by combining Q-Learning with deep learning techniques.

Uploaded by

2024tm05030

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

43 views12 pages

Hota ML ReinforcementLearning

Uploaded by

2024tm05030

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Birla Institute of Technology and Science Pilani, Hyderabad Campus

29.11.2024

BITS F464: Machine Learning (1st Sem 2024-25)

Introduction to Reinforcement Learning(RL)

Chittaranjan Hota, Sr. Professor

Dept. of Computer Sc. and Information Systems
hota@hyderabad.bits-pilani.ac.in
What is Reinforcement Learning?
• Agent tries to maximize the cumulative reward from the
environment by performing a set of actions.

Image source: https://highlandcanine.com/ Image source: UltraTech Cement Stock, 27th Nov 2024 from
www.nseindia.com

Applications: Gaming, Robotics, Autonomous vehicles, Personalized treatment

etc.
Formal Modelling: Markov Decision Process
Markov: The future state can be determined only from the present state that
encapsulates all the necessary information from the past.

What should the player ‘O’ do here to avoid a loss?

MDP Continued…

(Discounted Cumulative Reward)

action

Agent state/obser Environment : Discount factor controlling

future rewards.

reward

St St+1 St+2
Q-Learning Algorithm
• Q-learning is a model-free reinforcement learning (RL) algorithm used to
learn the optimal policy for a Markov Decision Process (MDP)

Initialize Q-Table

Select an Action
After multiple Episodes,
a good Q-Table is ready

Perform Action

Measure Reward
+ [ + max ]
Update Q-Table
An Example of Q-Learning
• Initializing the environment: States: {s0, s1, s2}, Actions: {a0, a1}, Rewards:
R(s0, a0) = -1, R(s0, a1) = +2, R(s1, a0) = +3, R(s1, a1) = +1, R(s2, any action) = 0
(terminal state).

• Transitions: T(s0, a0)  s1, T(s0,a1)  s2 (goal), T(s1,a0)  s2, T(s1, a1)s0

• Parameters: α = 0.5, γ = 0.9, Initial Q-values (Q(s, a) = 0 for all s, a).

• Episode 1:
• current state: s0, action chosen: a0 (randomly using exploration), reward: R(s0,
a0) = -1, next state: s1.
• Update Q(s0,a0) using Bellman’s equation:

• Q (s0, a0)  0 + 0.5 [-1 + 0.9 * max Q(s1, a’) – 0]

• Q(s0, a0)  0.5 * [-1 + 0] = -0.5 (Since, Q(s1, a’) = 0 initially (no knowledge of s1).
Updated Q-values after 3 Episodes
State Action(a0) Action(a1)
Ex. Continued… s0 -0.5 1.0
s1 1.5 0.0
s2 0.0 0.0
•Episode 2: From s1
•current state: s1, action chosen: a0, reward: R(s1, a0) = +3, next state: s2.
•Update Q(s0,a0) using Bellman’s equation:
Q(s1,a0) Q(s1,a0) + α[R+ γ max Q(s2, a’) – Q(s1, a0)]
a’
•Q (s1, a0)  0 + 0.5 [3 + 0.9 * 0 – 0] = 1.5

•Episode 3: Back to s0(different action)

•current state: s0, action chosen: a1, reward: R(s0, a1) = +2, next state: s2.
•Update Q(s0,a1) using Bellman’s equation:
•Q(s0,a1) Q(s0,a1) + α[R+ γ max Q(s2, a’) – Q(s0, a1)]
a’
•Q (s0, a1)  0 + 0.5 [2 + 0.9 * 0 – 0] = 1.0

• Alternatively, you may use an ANN to learn Q-values: Deep Q-Learning (DQN)
Optimal Solution using Q-Learning: Maze
import numpy as np
import
matplotlib.pyplot
as plt

# Maze parameters
maze = [
[0, 1, 0, 0, 0],
[0, 1, 0, 1, 0],
[0, 0, 0, 1, 0],
[0, 1, 0, 0, 0],
[0, 0, 0, 1, 2]
#'2'is the diamond
(goal state)
]

maze = np.array(maze)
Python Code
Continued…

…
Deep Q-Learning (DQN) for RL
• When the number of states and actions become very large, how do you
scale?
• Solution: Combine Q-Learning and Deep Learning Deep Q-Networks (DQN)
• Goal: Approximate a function: Q(s,a; θ), where θ represents the
trainable weights of the network

• Q(s,a) = r(s,a) + γ max Q(s’,a) Bellman’s equation

• Cost = {Q(s,a; θ) – [r(s,a)+γ max Q(s’,a; θ)]}2 ⇔

ANN
Q(a1)
s
Q
Q(a2)
⇒ Q(a3)
a (In-efficient as we need more
iterations) (Improved)
Thank You!

Good luck for Comprehensive Exams!

3.5 Intro2DeepQLearning
No ratings yet
3.5 Intro2DeepQLearning
12 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Q-Learning and Deep Q Networks (DQN)
No ratings yet
Q-Learning and Deep Q Networks (DQN)
52 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
Unit 4
No ratings yet
Unit 4
12 pages
Learning Task
No ratings yet
Learning Task
14 pages
42-Deep Q Learning
No ratings yet
42-Deep Q Learning
8 pages
Chapter 1 Introduction RL Report Kiran
No ratings yet
Chapter 1 Introduction RL Report Kiran
2 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
15) EXPLAIN Fitted Q and Deep Q-Learning
No ratings yet
15) EXPLAIN Fitted Q and Deep Q-Learning
17 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning With Python
No ratings yet
Reinforcement Learning With Python
24 pages
Exp1 D16AD 60
No ratings yet
Exp1 D16AD 60
11 pages
RL Algorithms in Gymnasium
No ratings yet
RL Algorithms in Gymnasium
59 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
Reinforcement Learning Notes ?
No ratings yet
Reinforcement Learning Notes ?
40 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Neural Networks Reinforcement Learning
No ratings yet
Neural Networks Reinforcement Learning
22 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
w7 - Reinforcement Learning
No ratings yet
w7 - Reinforcement Learning
5 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Class Notes
No ratings yet
Class Notes
4 pages
Unit 5
No ratings yet
Unit 5
54 pages
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
No ratings yet
Reinforcement Learning: EEE 485/585 Statistical Learning and Data Analytics
15 pages
Lec 09
No ratings yet
Lec 09
26 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Elementos Basicos Aprendizaje Por Refuerzo
No ratings yet
Elementos Basicos Aprendizaje Por Refuerzo
52 pages
Mlunit 5
No ratings yet
Mlunit 5
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Q Learning
No ratings yet
Q Learning
38 pages
Green and Black Modern Machine Learning Presentation
No ratings yet
Green and Black Modern Machine Learning Presentation
14 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Q Learning
No ratings yet
Q Learning
38 pages
Instruct Contributions of Study Skills Gettinger 2002
No ratings yet
Instruct Contributions of Study Skills Gettinger 2002
16 pages
Steps in The Organization of Nursing Educational Administration
90% (21)
Steps in The Organization of Nursing Educational Administration
14 pages
Let's Study: The Critical Attributes of 21st Century Education
No ratings yet
Let's Study: The Critical Attributes of 21st Century Education
3 pages
Proposed Development of Microcontroller Trainer Kit
No ratings yet
Proposed Development of Microcontroller Trainer Kit
10 pages
Newton D Ring Parent Handbook 2022-23
No ratings yet
Newton D Ring Parent Handbook 2022-23
23 pages
Wgu c109 Task 1 Passed
No ratings yet
Wgu c109 Task 1 Passed
6 pages
Amee 2007 Abstracts
No ratings yet
Amee 2007 Abstracts
241 pages
A Corpus-Based Analysis of Errors in Spanish EFL Writings
No ratings yet
A Corpus-Based Analysis of Errors in Spanish EFL Writings
14 pages
Annotated Bibliography Guide
No ratings yet
Annotated Bibliography Guide
2 pages
Qsat Q1 2025
No ratings yet
Qsat Q1 2025
13 pages
Computer Science C++ Project
No ratings yet
Computer Science C++ Project
25 pages
CSTP 3: Understanding and Organizing Subject Matter For Student Learning
No ratings yet
CSTP 3: Understanding and Organizing Subject Matter For Student Learning
7 pages
Q1 LE English 5 Lesson 6 Week 6
No ratings yet
Q1 LE English 5 Lesson 6 Week 6
15 pages
English: Quarter 1 - Module 1: Modals: Prohibition, Obligation and Permission
No ratings yet
English: Quarter 1 - Module 1: Modals: Prohibition, Obligation and Permission
22 pages
National Learning Camp
No ratings yet
National Learning Camp
4 pages
Developmental Reading (Drill)
No ratings yet
Developmental Reading (Drill)
3 pages
Digital-Skill-Web-Analytics Certificate of Achievement H334ey3
No ratings yet
Digital-Skill-Web-Analytics Certificate of Achievement H334ey3
2 pages
Assessment One Help Sheet-1
No ratings yet
Assessment One Help Sheet-1
2 pages
EJ1383105
No ratings yet
EJ1383105
24 pages
Grade 10 Set 6 Housekeeping
No ratings yet
Grade 10 Set 6 Housekeeping
5 pages
Eal Level 1 Curriculum Overview 4
No ratings yet
Eal Level 1 Curriculum Overview 4
9 pages
Cameron Adams Resume
No ratings yet
Cameron Adams Resume
2 pages
Scott Thornburys 30 Language Teaching Methods
No ratings yet
Scott Thornburys 30 Language Teaching Methods
141 pages
Whole Brain Teaching
No ratings yet
Whole Brain Teaching
14 pages
Maz-Kathy Kawade Final Write-Up 2015
No ratings yet
Maz-Kathy Kawade Final Write-Up 2015
2 pages
Classroom Well-Being Strategies
No ratings yet
Classroom Well-Being Strategies
3 pages
Intrepforalteryx
No ratings yet
Intrepforalteryx
15 pages
I Objectives
No ratings yet
I Objectives
3 pages
Quarter 1: Learner'S Material
No ratings yet
Quarter 1: Learner'S Material
41 pages
AI and DS Final Document For Phase 5
No ratings yet
AI and DS Final Document For Phase 5
9 pages

Hota ML ReinforcementLearning

Uploaded by

Hota ML ReinforcementLearning

Uploaded by

Birla Institute of Technology and Science Pilani, Hyderabad Campus

BITS F464: Machine Learning (1st Sem 2024-25)

Chittaranjan Hota, Sr. Professor

Applications: Gaming, Robotics, Autonomous vehicles, Personalized treatment

What should the player ‘O’ do here to avoid a loss?

(Discounted Cumulative Reward)

Agent state/obser Environment : Discount factor controlling

• Parameters: α = 0.5, γ = 0.9, Initial Q-values (Q(s, a) = 0 for all s, a).

• Q (s0, a0)  0 + 0.5 [-1 + 0.9 * max Q(s1, a’) – 0]

•Episode 3: Back to s0(different action)

• Q(s,a) = r(s,a) + γ max Q(s’,a) Bellman’s equation

• Cost = {Q(s,a; θ) – [r(s,a)+γ max Q(s’,a; θ)]}2 ⇔

Good luck for Comprehensive Exams!

You might also like