KEMBAR78
Module - 5 - 6 - Reinforcement Learning | PDF | Learning | Machine Learning
0% found this document useful (0 votes)
14 views15 pages

Module - 5 - 6 - Reinforcement Learning

Reinforcement Learning (RL) is a machine learning approach that enables agents to learn decision-making through interactions with their environment, receiving rewards or penalties based on their actions. It is applied in various fields such as robotics and game playing, where agents learn optimal strategies to maximize cumulative rewards. Key concepts include states, actions, rewards, policies, and Q-values, with techniques like Deep Q-Learning enhancing traditional Q-learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views15 pages

Module - 5 - 6 - Reinforcement Learning

Reinforcement Learning (RL) is a machine learning approach that enables agents to learn decision-making through interactions with their environment, receiving rewards or penalties based on their actions. It is applied in various fields such as robotics and game playing, where agents learn optimal strategies to maximize cumulative rewards. Key concepts include states, actions, rewards, policies, and Q-values, with techniques like Deep Q-Learning enhancing traditional Q-learning methods.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Reinforcement Learning

G.Prethija, SCOPE, VIT-Chennai


Supervised vs Unsupervised vs Reinforcement Learning
Reinforcement Learning

• Reinforcement Learning (RL) is a machine learning approach inspired by behaviorist


psychology and, in particular, the way humans and animals learn to take decisions via (positive or
negative) rewards received by their environment.
• An agent learns to make decisions by interacting with an environment. The agent takes actions,
observes the results, and receives feedback in the form of rewards or penalties. Over time, the agent
aims to maximize its cumulative reward by learning an optimal strategy or policy.
• Semi supervised learning(reward=time delayed labels, labels are rare)
• Reinforcement Learning is a family of algorithms and techniques used for Control (e.g. Robotics,
Autonomous driving, etc..) and decision making
Reinforcement Learning- Applications
Reinforcement Learning
•Agent:
•The learner or decision-maker (e.g., a robot, game character)
•Takes action
•Environment:
•The external system the agent interacts with (e.g., game
world, real world).
•State: A representation of the current situation the agent is in,
based on the environment (e.g., player position).
•Action: Choices the agent can make at any given time (e.g., move
left, right, jump).
•Reward: Feedback from the environment based on the action
taken, which can be positive (reward) or negative (penalty).

•Policy: A strategy the agent follows to decide which actions to take


in different states.
•Value Function: A measure of the expected long-term reward for a
state or a state-action pair.
•Q-value: Represents the expected future reward for taking a
specific action in a given state, used in algorithms like Q-learning.
Reinforcement Learning-Use cases

Robot Ball-In-A-Cup https://www.youtube.com/watch?v=qtqubguikMk

Reinforcement Learning https://www.youtube.com/watch?v=b2PxUslKZm4


for Robot Navigation

Unitree Go2 & B2 robotic dog https://www.youtube.com/watch?v=g6NfGuV0IVE


Reinforcement Learning-Use cases

State: position or cell


Action :Move up, right, left, down
Reward: positive or negative

The mouse may get the cheese at the end


Reward is sparse(rare)
Reinforcement Learning

• Design a policy of what


actions to be taken for
state s to maximize the
chance of getting future
rewards
• Environment is
probabilistic, therefore
policy is also probabilistic
Reinforcement Learning

How much award I get


in the future
Reinforcement Learning-How to train AI to Play the Snake Game

On the left, AI does not know anything about the game. On the right, the AI is trained and learnt how to play.
Reinforcement Learning-How to train AI to Play the Snake Game

• set of states S ( an index based on Snake’s position)


• set of actions A (Up, Down, Right, Left)
• a reward function R (+10 when Snake eats an apple, -10 when Snakes hits a wall)
• environment (our game)
• agent (our Snake i.e., Deep Neural Network that drives our Snake’s actions)

Every time the agent performs an action, the environment gives a reward to the agent, which can be
positive or negative depending on how good the action was from that specific state.
The goal of the agent is to learn what actions maximize the reward, given every possible state.
States are the observations that the agent receives at each iteration from the environment. A state can
be its position, its speed, or whatever array of variables describes the environment.

To be more rigorous and to use a Reinforcement Learning notation, the strategy used by the agent to
make decisions is called policy.
Reinforcement Learning-How to train AI to Play the Snake Game

• To understand how the agent takes decisions, we need to know what a Q-Table is.
• A Q-table is a matrix that correlates the state of the agent with the possible actions that the agent
can adopt. The values in the table are the action’s probability of success (technically, a measure of
the expected cumulative reward), which were updated based on the rewards the agent received
during training.
• An example of a greedy policy is a policy where the agent looks up the table and selects the action
that leads to the highest score.

This table is the policy of the agent


that we mentioned before:
it determines what actions should be taken
from every state to maximize the expected
reward

Demerit: finite state space


Reinforcement Learning-How to train AI to Play the Snake Game

Deep Q-Learning increases the potentiality of Q-Learning by converting


the table into Deep Neural Network — that is a powerful representation of
a parametrized function. The Q-values are updated according to the
Bellman equation:
Reinforcement Learning-How to train AI to Play the Snake Game
Algorithm

•The game starts, and the Q-value is randomly initialized.


•The agent collects the current state s (the observation).
•The agent executes an action based on the collected state. The action can either be
random or returned by its neural network. During the first phase of the training, the
system often chooses random actions to maximize exploration. Later on, the system
relies more and more on its neural network.
•When the AI chooses and performs the action, the environment gives a reward to
the agent. Then, the agent reaches the new state state’ and it updates its Q-value
according to the Bellman equation as mentioned above. Also, for each move, it stores
the original state, the action, the state reached after performed that action, the reward
obtained and whether the game ended or not. This data is later sampled to train the
neural network. This operation is called Replay Memory.
•These last two operations are repeated until a certain condition is met
References

• Artificial Intelligence and Games, Georgios N. Yannakakis and Julian Togelius,


January 26, 2018, Springer
• https://towardsdatascience.com/how-to-teach-an-ai-to-play-games-deep-
reinforcement-learning-28f9b920440a
• https://www.youtube.com/watch?v=0MNVhXEX9to
• https://www.youtube.com/watch?v=AhyznRSDjw8

You might also like