0% found this document useful (0 votes)

30 views56 pages

Intro To Reinforcement Learning

prof. Carlo Lucibello Department of Computing Sciences Bocconi University

Uploaded by

s25237

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views56 pages

Intro To Reinforcement Learning

prof. Carlo Lucibello Department of Computing Sciences Bocconi University

Uploaded by

s25237

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

Intro to

Reinforcement
Learning
prof. Carlo Lucibello

Department of Computing Sciences

Bocconi University
Machine learning paradigms

● Supervised learning: “learn to predict”

Machine learning paradigms

● Supervised learning: “learn to predict”

● Unsupervised learning: “learn the representation”

Machine learning paradigms

● Supervised learning: “learn to predict”

● Unsupervised learning: “learn the representation”

● Reinforcement learning: “learn to control”

Supervised Learning Example:
diagnosing heart abnormalities

● ML approach: teach the computer program

through examples

● Ask cardiologists to label a large number of

recorded ECG signals

● The learning algorithms adjusts the

program/model so that its predictions agree
with cardiologists’ labels
Supervised Learning

Training Set of input-output pairs:

deterministic prediction
Model:

probabilistic prediction
Reinforcement Learning
Games
Trading
Robotics
Reinforcement Learning in nature
Reinforcement Learning basics

Two key elements:

• Learning via trial and error
• No supervision
MAIN INGREDIENTS

● Agent: It’s me, Mario!

● State: Observations about the

world

● Actions: Decisions on what to

do next

● Rewards: Positive or negative

consequences of the actions

● Learning: Learn to choose the

action that gets the largest
reward!
Basic set-up

● An agent repeatedly interacts

and learns from a stochastic
environment with the goal of
maximizing cumulative rewards.

● “Trial-and-error” method of
learning.

● Algorithmic principles motivated

by psychology and neuroscience:
rewards provide a positive
reinforcement for an action.
Key features
Lack of a “supervisor”
● No labels telling us the best action to take
● We only have a reward signal

Delayed feedback
● The effect of an action may not be entirely visible instantaneously, but it may
affect the reward signal many steps later

Sequential decisions
● The sequence in which you make your moves will decide the path you take and
hence the ﬁnal outcome

Actions affects observations

● Observations are a function of the agents’ own actions, which the agent may
decide based on its past observations
Agent & Environment
State (Level Info)

Dapeng Hong , Billy Wan, Yaqi Zhang

Action (Up, Down, Left, Right….)
Rewards
Finish the level
Still in game

Game over
Get a coin
What we want to learn?

Given the current

state and the
possible reward,
what is the best
action we can
choose?

Optimal POLICY!
LET'S FORMALIZE
MATHEMATICALLY
ALL THIS
State -> Action -> Reward -> State (SARS)
State (a vector of numbers)
Action (discrete choices)
Rewards
Finish the level:+100
Still in game: + 0.1 per sec

Game over: -10

Get a coin: +1
Return: cumulative reward

“Future”
reward…?

Discount factor
HOW WE CAN LEARN?
What we want to learn?

Given the current

state and the
possible reward,
what is the best
action we can
choose?
First approach: Policy Learning

How we can ﬁnd the best action given a speciﬁc state?

We can “build” a function, the so called POLICY, and try to optimize it

Deterministic policy:

Stochastic policy:
Second Approach: Q-Learning

Given a policy, one can compute what is the average return (total reward)
obtained by playing a certain action in a certain state and then keep playing
according to the policy. This is called Q-function:

One can then use the Q-function to improve the

policy, then compute a new Q-function and so on.

E.g. greedy policy derived from Q-function:

HOW TO LEARN THE
Q-FUNCTION?
Q-Learning (Classic Algorithm)

Idea: what if we consider the expected value of reward for each action in the different states?

A1 A2 A3

S1 12 0 -10

S2 4 10 1000

● Q-Learning helps the agent make decisions by estimating the value of

different actions in different states

● It learns from experience which is the best action for each state

● You build your policy, the play the game many times and update the Q
function
Q-Table and Q-value
Learning the Q-table (pseudo-code)
Q-value update

Learning Rate Discount factor

Reward
after the the maximum value of
step the Q function for all
possible actions in the
new state
DOES IT REALLY WORK?
Real world case

Chess game:
● State: 10^50
● Action: 30-40 (legal ones)
or 2^18 (possible)
● Transitions to a new state are
deterministic, but depend on
the adversary
● Rewards: 0 for each
intermediate step, {-1,0,1} at
the end
Real world case

GO game:
● State: 3^361 (possible) or
10^170 (legal)
● Action: 200 (average)
or 361 (beginning)
● Transitions to a new state are
deterministic, but depend on
the adversary
● Rewards: 0 for each
intermediate step, {-1,0,1} at
the end
Real world case

Texas Hold ‘em:

● State: 10^31×4^8 (9 players)
● Action: 4 (fold,rise, call, check)
● Transitions to a new state are
stochastic and depend on the
adversary
● Rewards: 0 for each step, {−1, 0,
1} at the and
Real world case

Tram in Milano:
● State: 18×30×2
(lines × stops × (at or going to))
● Action: 100^3 (#tram^#state)
HOW DO DEAL
WITH COMPLEX
SETTINGS?
Q-Deep-learning

Constructing this function, even using Q-learning, could be an impossible task due to large state space.

However, we can approximate it.

● Exploration: try new actions to gather information about the environment

● Exploitation: choose actions based on the current knowledge to maximize rewards

A balance between exploration and exploitation is essential!

Exploration vs Exploitation
epsilon-greedy policy
Stuck into the walls?

Global (or Delayed) Rewards: Do not look at the reward at every instant – look
at all rewards at the same time.

Delayed Rewards Delayed Punishment

Let's train our Neural Networks
using Pytorch and Google Colab!

https://tinyurl.com/BocconiRL
Other Links

https://tinyurl.com/BocconiFrozenLake

https://tinyurl.com/FrozenLakeTEO

https://huggingface.co/learn/deep-rl-course/unit0/introduction

21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Q Learing
No ratings yet
Q Learing
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Module - 5 - 6 - Reinforcement Learning
No ratings yet
Module - 5 - 6 - Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
37 RL
No ratings yet
37 RL
18 pages
Unit 1
No ratings yet
Unit 1
18 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
59 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
Reinforcement Learning Mastery Path
No ratings yet
Reinforcement Learning Mastery Path
18 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
CSE 445 - Lecture 9 - Reinforcement Learning
No ratings yet
CSE 445 - Lecture 9 - Reinforcement Learning
45 pages
Introduction To Reinforcement Learning
No ratings yet
Introduction To Reinforcement Learning
62 pages
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
No ratings yet
Multi-Agent Reinforcement Learning-Implementation of Hide and Seek
7 pages
Unit 5
No ratings yet
Unit 5
45 pages
COMP3411 Week 4 - RL
No ratings yet
COMP3411 Week 4 - RL
79 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
AI Learning for Advanced Users
No ratings yet
AI Learning for Advanced Users
12 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Learning Task
No ratings yet
Learning Task
14 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
An Introduction To Deep ReinforcementLearning
No ratings yet
An Introduction To Deep ReinforcementLearning
65 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
Markov Decision Process: Reinforcement Learning
No ratings yet
Markov Decision Process: Reinforcement Learning
10 pages
Lecture 10 - Overview of RL With A VIP Perspective
No ratings yet
Lecture 10 - Overview of RL With A VIP Perspective
35 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
15 Deep Reinforcement Learning v24.2
No ratings yet
15 Deep Reinforcement Learning v24.2
115 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Object Tracking
No ratings yet
Object Tracking
50 pages
Audio DeepFake Detection Insights
100% (1)
Audio DeepFake Detection Insights
6 pages
Southnlp2024 Poster 34
No ratings yet
Southnlp2024 Poster 34
1 page
Deep Learning Dec
No ratings yet
Deep Learning Dec
1 page
Face Detection Project
No ratings yet
Face Detection Project
10 pages
Deep Learning For Quantitative Finance - Advanced Machine - Van Der Post, Hayden - 2023 - Reactive Publishing - Anna's Archive
No ratings yet
Deep Learning For Quantitative Finance - Advanced Machine - Van Der Post, Hayden - 2023 - Reactive Publishing - Anna's Archive
187 pages
Intelligent Waste Sorting with CNN
No ratings yet
Intelligent Waste Sorting with CNN
14 pages
Biological Neuron:: What Is ANN?
No ratings yet
Biological Neuron:: What Is ANN?
4 pages
Sixteen Week Plan Faculty of Computing & Information Technology Department of Computer Science Hafiz Hayat Campus, University of Gujrat
No ratings yet
Sixteen Week Plan Faculty of Computing & Information Technology Department of Computer Science Hafiz Hayat Campus, University of Gujrat
3 pages
Automated Answer Evaluation with ML
No ratings yet
Automated Answer Evaluation with ML
3 pages
UNIT-1: 1. What Is Machine Learning?
No ratings yet
UNIT-1: 1. What Is Machine Learning?
130 pages
LV Residential Floor Plan Recognition and Reconstruction CVPR 2021 Paper
No ratings yet
LV Residential Floor Plan Recognition and Reconstruction CVPR 2021 Paper
10 pages
MachineLearning Lecture 2
No ratings yet
MachineLearning Lecture 2
23 pages
P-CNN: Pose-Based CNN Features For Action Recognition: Guilhem CH Eron Ivan Laptev Cordelia Schmid Inria
No ratings yet
P-CNN: Pose-Based CNN Features For Action Recognition: Guilhem CH Eron Ivan Laptev Cordelia Schmid Inria
9 pages
Out-of-Distribution Detection With Deep Nearest Neighbors: Lee Et Al. 2018 Tack Et Al. 2020 Sehwag Et Al. 2021
No ratings yet
Out-of-Distribution Detection With Deep Nearest Neighbors: Lee Et Al. 2018 Tack Et Al. 2020 Sehwag Et Al. 2021
14 pages
End-to-End Boundary Aware Networks For Medical Image Segmentation
No ratings yet
End-to-End Boundary Aware Networks For Medical Image Segmentation
8 pages
AI in Education: Trends and Applications
No ratings yet
AI in Education: Trends and Applications
7 pages
A Comparative Study Between Full-Parameter and LoRA-based
No ratings yet
A Comparative Study Between Full-Parameter and LoRA-based
8 pages
(2110.06512) MedNet - Pre-Trained Convolutional Neural Network Model For The Medical Imaging Tasks
No ratings yet
(2110.06512) MedNet - Pre-Trained Convolutional Neural Network Model For The Medical Imaging Tasks
4 pages
AI Insights for Researchers
No ratings yet
AI Insights for Researchers
9 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Feed Forward Neural Networks: Prof. Adel Abdennour
No ratings yet
Feed Forward Neural Networks: Prof. Adel Abdennour
48 pages
Evolution of Artificial Intelligence
No ratings yet
Evolution of Artificial Intelligence
3 pages
Introduction To AI-ML-and Applications
No ratings yet
Introduction To AI-ML-and Applications
115 pages
Principles of Training Multi-Layer Neural Network Using Backpropagation
No ratings yet
Principles of Training Multi-Layer Neural Network Using Backpropagation
9 pages
Applied Sciences: A State-of-Art-Review On Machine-Learning Based Methods For PV
No ratings yet
Applied Sciences: A State-of-Art-Review On Machine-Learning Based Methods For PV
34 pages
NPU MachineLearning
No ratings yet
NPU MachineLearning
28 pages
AI
No ratings yet
AI
3 pages
How Machines Learn An Illustrated Guide To Machine Learning (Helen Edwards) (Z-Library)
No ratings yet
How Machines Learn An Illustrated Guide To Machine Learning (Helen Edwards) (Z-Library)
63 pages
Module 1 (ML)
No ratings yet
Module 1 (ML)
17 pages