0% found this document useful (0 votes)

11 views15 pages

Module - 5 - 6 - Reinforcement Learning

Reinforcement Learning (RL) is a machine learning approach that enables agents to learn decision-making through interactions with their environment, receiving rewards or penalties based on their actions. It is applied in various fields such as robotics and game playing, where agents learn optimal strategies to maximize cumulative rewards. Key concepts include states, actions, rewards, policies, and Q-values, with techniques like Deep Q-Learning enhancing traditional Q-learning methods.

Uploaded by

21CGB1002 Sushmetha.S.R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Module - 5 - 6 - Reinforcement Learning

Uploaded by

21CGB1002 Sushmetha.S.R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Reinforcement Learning

G.Prethija, SCOPE, VIT-Chennai

Supervised vs Unsupervised vs Reinforcement Learning
Reinforcement Learning

• Reinforcement Learning (RL) is a machine learning approach inspired by behaviorist

psychology and, in particular, the way humans and animals learn to take decisions via (positive or
negative) rewards received by their environment.
• An agent learns to make decisions by interacting with an environment. The agent takes actions,
observes the results, and receives feedback in the form of rewards or penalties. Over time, the agent
aims to maximize its cumulative reward by learning an optimal strategy or policy.
• Semi supervised learning(reward=time delayed labels, labels are rare)
• Reinforcement Learning is a family of algorithms and techniques used for Control (e.g. Robotics,
Autonomous driving, etc..) and decision making
Reinforcement Learning- Applications
Reinforcement Learning
•Agent:
•The learner or decision-maker (e.g., a robot, game character)
•Takes action
•Environment:
•The external system the agent interacts with (e.g., game
world, real world).
•State: A representation of the current situation the agent is in,
based on the environment (e.g., player position).
•Action: Choices the agent can make at any given time (e.g., move
left, right, jump).
•Reward: Feedback from the environment based on the action
taken, which can be positive (reward) or negative (penalty).

•Policy: A strategy the agent follows to decide which actions to take

in different states.
•Value Function: A measure of the expected long-term reward for a
state or a state-action pair.
•Q-value: Represents the expected future reward for taking a
specific action in a given state, used in algorithms like Q-learning.
Reinforcement Learning-Use cases

Robot Ball-In-A-Cup https://www.youtube.com/watch?v=qtqubguikMk

Reinforcement Learning https://www.youtube.com/watch?v=b2PxUslKZm4

for Robot Navigation

Unitree Go2 & B2 robotic dog https://www.youtube.com/watch?v=g6NfGuV0IVE

Reinforcement Learning-Use cases

State: position or cell

Action :Move up, right, left, down
Reward: positive or negative

The mouse may get the cheese at the end

Reward is sparse(rare)
Reinforcement Learning

• Design a policy of what

actions to be taken for
state s to maximize the
chance of getting future
rewards
• Environment is
probabilistic, therefore
policy is also probabilistic
Reinforcement Learning

How much award I get

in the future
Reinforcement Learning-How to train AI to Play the Snake Game

On the left, AI does not know anything about the game. On the right, the AI is trained and learnt how to play.
Reinforcement Learning-How to train AI to Play the Snake Game

• set of states S ( an index based on Snake’s position)

• set of actions A (Up, Down, Right, Left)
• a reward function R (+10 when Snake eats an apple, -10 when Snakes hits a wall)
• environment (our game)
• agent (our Snake i.e., Deep Neural Network that drives our Snake’s actions)

Every time the agent performs an action, the environment gives a reward to the agent, which can be
positive or negative depending on how good the action was from that specific state.
The goal of the agent is to learn what actions maximize the reward, given every possible state.
States are the observations that the agent receives at each iteration from the environment. A state can
be its position, its speed, or whatever array of variables describes the environment.

To be more rigorous and to use a Reinforcement Learning notation, the strategy used by the agent to
make decisions is called policy.
Reinforcement Learning-How to train AI to Play the Snake Game

• To understand how the agent takes decisions, we need to know what a Q-Table is.
• A Q-table is a matrix that correlates the state of the agent with the possible actions that the agent
can adopt. The values in the table are the action’s probability of success (technically, a measure of
the expected cumulative reward), which were updated based on the rewards the agent received
during training.
• An example of a greedy policy is a policy where the agent looks up the table and selects the action
that leads to the highest score.

This table is the policy of the agent

that we mentioned before:
it determines what actions should be taken
from every state to maximize the expected
reward

Demerit: finite state space

Reinforcement Learning-How to train AI to Play the Snake Game

Deep Q-Learning increases the potentiality of Q-Learning by converting

the table into Deep Neural Network — that is a powerful representation of
a parametrized function. The Q-values are updated according to the
Bellman equation:
Reinforcement Learning-How to train AI to Play the Snake Game
Algorithm

•The game starts, and the Q-value is randomly initialized.

•The agent collects the current state s (the observation).
•The agent executes an action based on the collected state. The action can either be
random or returned by its neural network. During the first phase of the training, the
system often chooses random actions to maximize exploration. Later on, the system
relies more and more on its neural network.
•When the AI chooses and performs the action, the environment gives a reward to
the agent. Then, the agent reaches the new state state’ and it updates its Q-value
according to the Bellman equation as mentioned above. Also, for each move, it stores
the original state, the action, the state reached after performed that action, the reward
obtained and whether the game ended or not. This data is later sampled to train the
neural network. This operation is called Replay Memory.
•These last two operations are repeated until a certain condition is met
References

• Artificial Intelligence and Games, Georgios N. Yannakakis and Julian Togelius,

January 26, 2018, Springer
• https://towardsdatascience.com/how-to-teach-an-ai-to-play-games-deep-
reinforcement-learning-28f9b920440a
• https://www.youtube.com/watch?v=0MNVhXEX9to
• https://www.youtube.com/watch?v=AhyznRSDjw8

Deep Reinforcement Learning for Snake Game
No ratings yet
Deep Reinforcement Learning for Snake Game
9 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Unit 1
No ratings yet
Unit 1
18 pages
Unit 5
No ratings yet
Unit 5
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Module 1
No ratings yet
Module 1
72 pages
Module 1
No ratings yet
Module 1
85 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Reinforcement 2
No ratings yet
Reinforcement 2
2 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
DQN Atari
No ratings yet
DQN Atari
26 pages
CMPE257 - W10C13 - Reinforcement Learning
No ratings yet
CMPE257 - W10C13 - Reinforcement Learning
161 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
63 pages
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
No ratings yet
A Beginner's Guide To Deep Reinforcement Learning: Skymind - Ai
23 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
25 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
RL & DL Notes
No ratings yet
RL & DL Notes
43 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
19 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
Combined Set
No ratings yet
Combined Set
106 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
32 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Artificial Intelligence: Computer Science & Engineering, Khulna University
No ratings yet
Artificial Intelligence: Computer Science & Engineering, Khulna University
30 pages
Unit 6
No ratings yet
Unit 6
34 pages
RL & DL Notes
No ratings yet
RL & DL Notes
73 pages
Unit 5 ML 3year
No ratings yet
Unit 5 ML 3year
17 pages
MLT Unit-5 Notes
No ratings yet
MLT Unit-5 Notes
17 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Sections
No ratings yet
Sections
76 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
59 pages
ML Unit-V
No ratings yet
ML Unit-V
20 pages
Unit 3
No ratings yet
Unit 3
29 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
Unit 5
No ratings yet
Unit 5
10 pages
Unit-5 Mla
No ratings yet
Unit-5 Mla
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
AI Unit - 3
No ratings yet
AI Unit - 3
102 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforcement Learning & Robotics
No ratings yet
Reinforcement Learning & Robotics
35 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Unit 4
No ratings yet
Unit 4
56 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
No ratings yet
Winter Semester 2023-24 - CSE4037 - ETH - AP2023246000594 - 2024-01-05 - Reference-Material-I
35 pages
Module 5 2 AI PanoramicViews
No ratings yet
Module 5 2 AI PanoramicViews
11 pages
Module 6 - 3 - Motion Tracking Navigation and Controllers in VR
No ratings yet
Module 6 - 3 - Motion Tracking Navigation and Controllers in VR
26 pages
Module 1 4 Memory Management
No ratings yet
Module 1 4 Memory Management
23 pages
2D Sprite Animation
No ratings yet
2D Sprite Animation
24 pages
Gauss Legendre Quadrature Method
No ratings yet
Gauss Legendre Quadrature Method
7 pages
Diborane: Properties and Applications
No ratings yet
Diborane: Properties and Applications
36 pages
Microsoft Excel MCQs
No ratings yet
Microsoft Excel MCQs
15 pages
Retaining Wall Drawing
No ratings yet
Retaining Wall Drawing
1 page
Oriental College of Technology: Ritika Makhija
No ratings yet
Oriental College of Technology: Ritika Makhija
23 pages
Banklogs Report
No ratings yet
Banklogs Report
3 pages
Business Research Methods: Problem Definition and The Research Proposal
No ratings yet
Business Research Methods: Problem Definition and The Research Proposal
37 pages
Discussion Design Procedure For A Contact Stabilization Activated Sludge Process Randall 1977
No ratings yet
Discussion Design Procedure For A Contact Stabilization Activated Sludge Process Randall 1977
9 pages
Sports Acoustics
No ratings yet
Sports Acoustics
43 pages
Grade 7/8 Carpentry Measurements
No ratings yet
Grade 7/8 Carpentry Measurements
14 pages
Wipro Technical Interview Questions
No ratings yet
Wipro Technical Interview Questions
3 pages
4in SB12MNRX2 25 4
No ratings yet
4in SB12MNRX2 25 4
1 page
Abg10 2 Abg 35 2 Multiturn Bevel Gearbox Technical Datasheet en
No ratings yet
Abg10 2 Abg 35 2 Multiturn Bevel Gearbox Technical Datasheet en
2 pages
Hack RQD
No ratings yet
Hack RQD
82 pages
Data Mining
No ratings yet
Data Mining
32 pages
Mineral Processing with CrossFlow
No ratings yet
Mineral Processing with CrossFlow
2 pages
Namma Kalvi 12th Computer Applications Practical Manual em
No ratings yet
Namma Kalvi 12th Computer Applications Practical Manual em
33 pages
Chemistry Basics for Students
No ratings yet
Chemistry Basics for Students
16 pages
Syllabus Apni Kaksha
No ratings yet
Syllabus Apni Kaksha
1 page
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
No ratings yet
ICDE 2024 Managing The Future Route Planning Influence Evaluation in Transportation Systems
15 pages
PM-0.5 MK: - Reference Manual
No ratings yet
PM-0.5 MK: - Reference Manual
7 pages
AA-2285573-1 - Tip Over Test
No ratings yet
AA-2285573-1 - Tip Over Test
3 pages
202,203, P205, P208 Bus Timetable
No ratings yet
202,203, P205, P208 Bus Timetable
6 pages
Quality Matters: Pollution Exacerbates Water Scarcity and Sectoral Output Risks in China
No ratings yet
Quality Matters: Pollution Exacerbates Water Scarcity and Sectoral Output Risks in China
10 pages
Coccinia Grandis
No ratings yet
Coccinia Grandis
9 pages
SPPS M1507 D Datasheet
No ratings yet
SPPS M1507 D Datasheet
2 pages
Rectifier-RM2048XE PDF
No ratings yet
Rectifier-RM2048XE PDF
2 pages
Reg Pop Density
No ratings yet
Reg Pop Density
1 page
Convection Heat Transfer
No ratings yet
Convection Heat Transfer
60 pages
Biology 0610 Paper 4 MS
No ratings yet
Biology 0610 Paper 4 MS
11 pages

Module - 5 - 6 - Reinforcement Learning

Uploaded by

Module - 5 - 6 - Reinforcement Learning

Uploaded by

Reinforcement Learning

G.Prethija, SCOPE, VIT-Chennai

• Reinforcement Learning (RL) is a machine learning approach inspired by behaviorist

•Policy: A strategy the agent follows to decide which actions to take

Robot Ball-In-A-Cup https://www.youtube.com/watch?v=qtqubguikMk

Reinforcement Learning https://www.youtube.com/watch?v=b2PxUslKZm4

Unitree Go2 & B2 robotic dog https://www.youtube.com/watch?v=g6NfGuV0IVE

State: position or cell

The mouse may get the cheese at the end

• Design a policy of what

How much award I get

• set of states S ( an index based on Snake’s position)

This table is the policy of the agent

Demerit: finite state space

Deep Q-Learning increases the potentiality of Q-Learning by converting

•The game starts, and the Q-value is randomly initialized.

• Artificial Intelligence and Games, Georgios N. Yannakakis and Julian Togelius,

You might also like