0% found this document useful (0 votes)

5 views17 pages

Reinforcement Learning

Uploaded by

Kalindu Liyanage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views17 pages

Reinforcement Learning

Uploaded by

Kalindu Liyanage

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

EE5076

Reinforcement
Learning
By: Theekshana Wijewardhana
Overview of this presentation

1. Introduction to RL
2. RL formalization
3. Concepts in RL
a. State
b. Action
c. Reward
d. Policy
e. Q function
4. Bellman’s equation
5. Introduction to OpenAI gym
Introduction

Rewards
★Positive: Meatshop
★Negative: Scary Dog
Introduction

Reinforcement Learning (RL) is a

type of machine learning where an
agent learns to make decisions by
interacting with an environment. The
agent takes actions, receives
rewards or penalties, and
continuously improves its strategy
(policy) to maximize long-term
rewards.
AI learns to park

Link: https://youtu.be/VMp6pq6_QjI
Reinforcement Learning formalization
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6

Move left : R(4) + R(3) + R(2) + R(1) : 0 + 0 + 0 + 100

Move right : R(4) + R(5) + R(6) : 0 + 0 + 40

Move random : R(4) + R(5) + R(3) + R(2) + R(1) : 0 + 0 + 0 + 0 +0+100

The return in Reinforcement Learning
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6
Discount factor = γ (0 - 1)

Move left : R(4) + γR(3) + γ2R(2) + γ3R(1) : 0 + 0 + 0 + γ3100

Move right : R(4) + γR(5) + γ2R(6) : 0 + 0 + γ240

Move random : R(4) + γR(5) + γ2R(4) + γ3R(3) + γ4R(2)+ γ5R(1) : 0 + 0 + 0 + 0 + 0 + γ5100

The return in Reinforcement Learning
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6
Discount factor = γ = 0.9

Move left : 0+ 0.90 + 0.920 + 0.93*100 : 72.9

Move right : 0 + 0.90 + 0.9240 : 32.4

Move random : 0 + 0.90 + 0.920 + 0.930 + 0.940+ 0.95*100 : 59.049

The return in Reinforcement Learning
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6
Discount factor = γ = 0.1

Move left : 0+ 0.10 + 0.120 + 0.13*100 : 0.1

Move right : 0 + 0.10 + 0.1240 : 0.4

Move random : 0 + 0.10 + 0.120 + 0.130 + 0.140+ 0.15*100 : 0.001

Policies in Reinforcement Learning
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6

If i am in state 4,
➢ Should I move left?
➢ Should I move right?
To get the best long-term return
Policies in Reinforcement Learning
Reward 100 0 0 0 0 40

State 1 2 3 4 5 6

A policy helps the agent to ﬁnd the best action to take

State(s) Best Action(a)

Policy(π)
Reinforcement learning
Key concepts
1. Agent: The learner or the decision
maker: self-driving car, robot
2. Environment: The world in which
the agent operates
3. State (s): The current situation of
the agent in the environment.
4. Action (a): A choice the agent can
make.
5. Reward (r): A numerical value
given to the agent based on its
action
6. Policy (π): A strategy that deﬁnes
the agent’s action selection
process
7. Return: The expected long-term
reward for a given state
8. Q function (Q(s,a)): The expected
reward of taking a particular action
in a given state
Optimum policy
The optimum policy (denoted as π*) is the strategy that allows an agent to achieve the
maximum expected cumulative reward over time. It deﬁnes the best action to take in every
state to maximize long-term rewards.

Reward 100 0 0 0 0 40
γ = 0.9 100 90 26.24 81 29.16 72.9 32.4 65.61 36 40

State 1 2 3 4 5 6

S4 π* Go left S3 π* Go left
State - Action value function/Q function
Reward 100 0 0 0 0 40
γ = 0.9 100 90 26.24 81 29.16 72.9 32.4 65.61 36 40

State 1 2 3 4 5 6

Q(s,a) = returns if you; Example:

1. Start in state s Q(4, right) = R(4) + γR(5) + γ2R(4) + …. + γ5R(1)
2. Take action a (Once)
Q(4, right) = R(4) + γ [R(5) + γR(4) + …. + γ4R(1)]
3. Then behave optimally after that
Q(4, right) = R(4) + γ max[Q(5, left), Q(5, right)]
Bellman's Equation

Q(s,a) = R(s) + γmax Q(s`, a`)

s : Current state
R(s): Reward of current state
A : Current action
Γ : Discount factor
s` : State you get to after action a
a` : Action that you take in state s`
Problem

When an agent is introduced to a new environment, it does not initially know the terminal state or the
optimal policy to follow. So, how can the agent determine the Q-values for a given state?

The agent should be capable of:

- Explore the environment

- Learning to estimate the Q-value for each possible action in a given state.
- Developing a policy that selects the optimal action for each state.
Thank you

Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
100% (1)
Unit 3 Theories and Principles in The Use and Design of Technology Driven Learning Lessons
49 pages
AVR Microcontroller Programming Guide
100% (1)
AVR Microcontroller Programming Guide
63 pages
Commissioning Procedures For Conveyors
100% (2)
Commissioning Procedures For Conveyors
2 pages
Reinforcement Learning
100% (1)
Reinforcement Learning
64 pages
Beginner's Guide To Accounting
100% (3)
Beginner's Guide To Accounting
70 pages
Survey-Questionnaire On Teachers Perception Re Distance Blended Learning
100% (1)
Survey-Questionnaire On Teachers Perception Re Distance Blended Learning
5 pages
List of Empanelled Firms On National Portal With Contacts
No ratings yet
List of Empanelled Firms On National Portal With Contacts
9 pages
Culture and Society
No ratings yet
Culture and Society
22 pages
NTA IGNOU PHD Entrance Exam Syllabus
No ratings yet
NTA IGNOU PHD Entrance Exam Syllabus
85 pages
ML Unit 5 Possible Questions and Answers
No ratings yet
ML Unit 5 Possible Questions and Answers
47 pages
Unit 1 Reinforcement Learning
No ratings yet
Unit 1 Reinforcement Learning
70 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Tiếng Anh thầy Tiểu Đạt - chuyên luyện thi Đại học Mr. Tieu Dat's English Academy Thầy Lưu Tiến Đạt (thầy Tiểu Đạt) Chuyên gia luyện thi môn Tiếng Anh
No ratings yet
Tiếng Anh thầy Tiểu Đạt - chuyên luyện thi Đại học Mr. Tieu Dat's English Academy Thầy Lưu Tiến Đạt (thầy Tiểu Đạt) Chuyên gia luyện thi môn Tiếng Anh
5 pages
Highway Alignment Principles
60% (5)
Highway Alignment Principles
89 pages
Unit - 5 RL
No ratings yet
Unit - 5 RL
38 pages
Force FX-8CS Service Manual - en
83% (6)
Force FX-8CS Service Manual - en
282 pages
Module - 5 - 6 - Reinforcement Learning
No ratings yet
Module - 5 - 6 - Reinforcement Learning
15 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Unit 4 QP
No ratings yet
Unit 4 QP
19 pages
A (Long) Peek Into Reinforcement Learning - Lil'Log
No ratings yet
A (Long) Peek Into Reinforcement Learning - Lil'Log
23 pages
RL Lecturer
No ratings yet
RL Lecturer
38 pages
Splenomegaly: Clinical Insights
No ratings yet
Splenomegaly: Clinical Insights
57 pages
Unit-5 ML Notes
No ratings yet
Unit-5 ML Notes
31 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
RL MJJ
No ratings yet
RL MJJ
32 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
10 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
Unit 4
No ratings yet
Unit 4
56 pages
EE5207 Lecture 3
No ratings yet
EE5207 Lecture 3
81 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
EE5207-2025-Lecture 6
No ratings yet
EE5207-2025-Lecture 6
54 pages
Sections
No ratings yet
Sections
76 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Lec 04 Reinforcement Learning
No ratings yet
Lec 04 Reinforcement Learning
57 pages
Unit Vi
No ratings yet
Unit Vi
17 pages
EE5211 Short Notes v2
No ratings yet
EE5211 Short Notes v2
2 pages
Meng2022 SDPEC
No ratings yet
Meng2022 SDPEC
3 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
No ratings yet
Unit 1 - Reinforcement Learning, Overfitting, Training, Validation Sets, Metrics, Bias and Variance
16 pages
MSc2024 SDPEC
No ratings yet
MSc2024 SDPEC
3 pages
IoT Short Note
No ratings yet
IoT Short Note
4 pages
Reinforced Learning
No ratings yet
Reinforced Learning
25 pages
M.L.K Nilupul - 249034T - EE5096-A1
No ratings yet
M.L.K Nilupul - 249034T - EE5096-A1
4 pages
Meng2023 SDPEC
No ratings yet
Meng2023 SDPEC
3 pages
Deep Q-Learning
No ratings yet
Deep Q-Learning
14 pages
Optimization Tasks
No ratings yet
Optimization Tasks
5 pages
ML Unit-4
No ratings yet
ML Unit-4
10 pages
Unit-5 (AI)
No ratings yet
Unit-5 (AI)
21 pages
258629A EE5096-InA1
No ratings yet
258629A EE5096-InA1
6 pages
ML and Control Systems
No ratings yet
ML and Control Systems
20 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Lesson 3 Four Pillars of Education
No ratings yet
Lesson 3 Four Pillars of Education
40 pages
4.3 Reinforcement Learning
No ratings yet
4.3 Reinforcement Learning
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
5 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
9 pages
Reinforcement Learning MY101
No ratings yet
Reinforcement Learning MY101
15 pages
Unit 1
No ratings yet
Unit 1
18 pages
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
No ratings yet
Dokumen - Tips - Registered Trademark of Basf Se Magnafloc Magnafloc 155 Is A High Molecular Weight
2 pages
Unit 5
No ratings yet
Unit 5
10 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
RL Ese Answers
No ratings yet
RL Ese Answers
16 pages
Ds d79 Diy Solution v1 1tv pb2fp74
No ratings yet
Ds d79 Diy Solution v1 1tv pb2fp74
5 pages
M.L.K Nilupul - 249034T - EE5208 - A2
No ratings yet
M.L.K Nilupul - 249034T - EE5208 - A2
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
29 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
EE5206 Embedded System Design 2022
No ratings yet
EE5206 Embedded System Design 2022
2 pages
Sdfesdf
No ratings yet
Sdfesdf
23 pages
2023 PP
No ratings yet
2023 PP
3 pages
EE 5206 - 2024 - Lecture 3
No ratings yet
EE 5206 - 2024 - Lecture 3
69 pages
37 RL
No ratings yet
37 RL
18 pages
M.L.K Nilupul - 29034T - A1
No ratings yet
M.L.K Nilupul - 29034T - A1
2 pages
Kopi
No ratings yet
Kopi
5 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
S04 Slides
No ratings yet
S04 Slides
118 pages
2021 PP
No ratings yet
2021 PP
6 pages
WhatsApp
No ratings yet
WhatsApp
1 page
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
51 pages
7095 Aow10t Exemple
No ratings yet
7095 Aow10t Exemple
2 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Application for Works Engineer Post
No ratings yet
Application for Works Engineer Post
5 pages
Some Thoughts On Reinforcement Learning: 1 Motivation
No ratings yet
Some Thoughts On Reinforcement Learning: 1 Motivation
9 pages
EE5206-Lecture 4
No ratings yet
EE5206-Lecture 4
53 pages
Wa0000.
No ratings yet
Wa0000.
67 pages
Multi-Agent Learning Dynamics
No ratings yet
Multi-Agent Learning Dynamics
26 pages
EE5075-2024-Lecture 4
No ratings yet
EE5075-2024-Lecture 4
75 pages
EE 5206 - 2024 - Lecture 3
No ratings yet
EE 5206 - 2024 - Lecture 3
76 pages
551 1R-14 Preview
No ratings yet
551 1R-14 Preview
4 pages
2022 PP
No ratings yet
2022 PP
8 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
EE5075 Lecture 3A Correct
No ratings yet
EE5075 Lecture 3A Correct
34 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
30 pages
EE5208-Industrial Robotics-Lecture 6
No ratings yet
EE5208-Industrial Robotics-Lecture 6
51 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
STID1103 SYLLABUS A211 Student
No ratings yet
STID1103 SYLLABUS A211 Student
5 pages
Assignment 1-Lead Lag Controller Design
No ratings yet
Assignment 1-Lead Lag Controller Design
1 page
Configure Eap Tls Authentication With Is
No ratings yet
Configure Eap Tls Authentication With Is
20 pages
LSD Thesis Statement
100% (3)
LSD Thesis Statement
5 pages
Quarmya Braswell - 2 Explore HierarchyOrganisms StationLab Digital D
No ratings yet
Quarmya Braswell - 2 Explore HierarchyOrganisms StationLab Digital D
25 pages
Reinforcement LN-6
No ratings yet
Reinforcement LN-6
13 pages
Use Case Points for Objectory Projects
No ratings yet
Use Case Points for Objectory Projects
9 pages
HR Interview Questions
No ratings yet
HR Interview Questions
8 pages
Introduction To This Teacher Resource
No ratings yet
Introduction To This Teacher Resource
2 pages
Smit Vipul Kalamkar - CV
No ratings yet
Smit Vipul Kalamkar - CV
2 pages
Coanda Effects
No ratings yet
Coanda Effects
33 pages
Mitosis Lecture PDF
No ratings yet
Mitosis Lecture PDF
11 pages
WEEK 4 - Hiking PPT With Youtube Links
No ratings yet
WEEK 4 - Hiking PPT With Youtube Links
25 pages
How To Mount A Remote File System Using Network File System (NFS)
No ratings yet
How To Mount A Remote File System Using Network File System (NFS)
3 pages
Financial Technologies (India) Limited CSR Policy
No ratings yet
Financial Technologies (India) Limited CSR Policy
8 pages
Black Dog Institute Online Clinic Assessment Report
No ratings yet
Black Dog Institute Online Clinic Assessment Report
7 pages

Reinforcement Learning

Uploaded by

Reinforcement Learning

Uploaded by

EE5076

Reinforcement Learning (RL) is a

Move left : R(4) + R(3) + R(2) + R(1) : 0 + 0 + 0 + 100

Move right : R(4) + R(5) + R(6) : 0 + 0 + 40

Move random : R(4) + R(5) + R(3) + R(2) + R(1) : 0 + 0 + 0 + 0 +0+100

Move left : R(4) + γR(3) + γ2R(2) + γ3R(1) : 0 + 0 + 0 + γ3100

Move right : R(4) + γR(5) + γ2R(6) : 0 + 0 + γ240

Move random : R(4) + γR(5) + γ2R(4) + γ3R(3) + γ4R(2)+ γ5R(1) : 0 + 0 + 0 + 0 + 0 + γ5100

Move left : 0+ 0.9*0 + 0.92*0 + 0.93*100 : 72.9

Move right : 0 + 0.9*0 + 0.92*40 : 32.4

Move random : 0 + 0.9*0 + 0.92*0 + 0.93*0 + 0.94*0+ 0.95*100 : 59.049

Move left : 0+ 0.1*0 + 0.12*0 + 0.13*100 : 0.1

Move right : 0 + 0.1*0 + 0.12*40 : 0.4

Move random : 0 + 0.1*0 + 0.12*0 + 0.13*0 + 0.14*0+ 0.15*100 : 0.001

A policy helps the agent to ﬁnd the best action to take

State(s) Best Action(a)

Q(s,a) = returns if you; Example:

Q(s,a) = R(s) + γmax Q(s`, a`)

The agent should be capable of:

- Explore the environment

You might also like

Move left : 0+ 0.90 + 0.920 + 0.93*100 : 72.9

Move right : 0 + 0.90 + 0.9240 : 32.4

Move random : 0 + 0.90 + 0.920 + 0.930 + 0.940+ 0.95*100 : 59.049

Move left : 0+ 0.10 + 0.120 + 0.13*100 : 0.1

Move right : 0 + 0.10 + 0.1240 : 0.4

Move random : 0 + 0.10 + 0.120 + 0.130 + 0.140+ 0.15*100 : 0.001