0% found this document useful (0 votes)

41 views26 pages

Lec 09

Uploaded by

kethanchalla2809

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views26 pages

Lec 09

Uploaded by

kethanchalla2809

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 26

Reinforcement Learning

CS786
28th January 2022
MDPRL
• In MDP, {S,A,R,P} are known
• In RL, R and P are not known to begin with
• They are learned from experience
• Optimal policy is updated sequentially to account for
increased information about rewards and transition
probabilities
• Model-based RL
– Learns transition probabilities P as well as optimal policy
• Model-free RL
– Learns only optimal policy, not the transition probabilities P
Q-learning
• Derived from the Bush-Mosteller update rule
• Agent sees a set of states S
• Possesses a set of A actions applicable to
these states
• Does not try to learn p(s, a, s’)
• Tries to learn a quality belief about a state-
action combination Q: S X A  Real
Q-learning update rule
• Start with random Q
• Update using

• Parameter α controls the learning rate

• Parameter λ controls the time-discounting of
future reward
Q-learning
• Agent sees a set of states S
• Possesses a set of A actions applicable to
these states
• Does not try to learn p(s, a, s’)
• Tries to learn a quality belief about a state-
action combination Q: S X A  Real
Q-learning update rule
• Start with random Q
• Update using

• Parameter α controls the learning rate

• Parameter λ controls the time-discounting of
future reward
• s’ is the state accessed from s
• a’ are actions available in s’
Q-learning algorithm
• Initialize Q(s,a) for all s and a
• For each episode
– Initialize s
– For each move
• Choose a from s using Q (softmax/e-greedy)
• Perform action a, observe R and s’
• Update Q(s,a)
• Move to s’
– Until s’ is terminal/moves run out
Q-learning update
The value of taking action a

Q(s,.)
in state s

Q(s, a)

1. Select a using choice rule on Q

Q-learning update

a
s s’

1. Select a using choice rule on Q

2. Take action a from state s
3. Observe r and s’
Q-learning update

Q(s’,.)
A
a 1’

a 2’ There are many possible

a a’ from the state you
s s’
reach

a 3’

1. Select a using choice rule on Q

2. Take action a from state s
3. Observe r and s’
4. Recall Q(s’,a’) for all a’ available from s’
Q-learning update

Q(s’,.)
a 1’ A
Q(s, a)
a 2’ Assume maximally
a rewarding action will be
s s’
selected at s’

a 3’
1. Select a using choice rule on Q
2. Take action a from state s
3. Observe r and s’
4. Recall Q(s’,a’) for all a’ available from s’
5. Update Q(s,a)
Q-learning example
• Open AI gym’s frozen lake
• Setup: agent is a character that has to walk
from a start point (S) across a frozen lake (F)
with holes (H) in some locations to reach G
• Specific instantiation

S F F F
F H F H
F F F H
H F F G
Q-learning example
• Agent starts with an empty Q-matrix
• Action possibilities = {left, right, up, down}
• Reward settings
– H = -100
– G = +100 0 0 0 0

–F=0 0 0 0 0

0 0 0 0

0 0 0 0
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F F F
F H F H
F F F H
H F F G
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F 0 F F
F H F H
F F F H
H F F G
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F 0 F F
0
F H F H
F F F H
H F F G
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F 0 F F
0
F H F H
-80
F F F H
H F F G
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F 0 F F
0
F H F H
-80 -80
F F F H
H F F G
Q-learning example
• Learning occurs via exploration episodes
• One episode is a sequence of moves
• Let’s work through one episode

S 0 F 0 F F
0
F H F H
-80 -80
F F F H
H F F G
+80
Generalized model-free RL
• Bush Mosteller style models simply update value
based on a discounted average of received rewards
– Useless in trying to predict the value of sequential
events, e.g. A B  reward
• A more generalized notion of reward learning was
needed
– Q-learning is one instance of temporal difference
learning
– Other flavors of model-free reinforcement learning also
exist, e.g. policy gradient methods
SARSA update rule
• Start with random Q
• Update using

• Parameter α controls the learning rate

• Parameter λ controls the time-discounting of
future reward
• s’ is the state accessed from s
• a’ is the action selected in s’
– Different from q-learning
SARSA algorithm
• Start with random Q(s, a) for all s and a
• For each episode
– Initialize s
– Choose a using Q (softmax/greedy)
– For each move
• Take action a, observe r, s’
• Choose a’ from s’ by comparing Q(s’, . )
• Update Q(s, a)
• Move to s’, remember a’
– Until s’ is terminal/moves run out
SARSA update

Q(s,.)
The value of taking action a
in state s
A

Q(s, a)

1. Start with a selected in the previous iteration

SARSA update

a
s s’

1. Start with a selected in the previous iteration

2. Take action a from state s
3. Observe r and s’
SARSA update

Q(s’,.)
A
a 1’

a 2’ There are many possible

a a’ from the state you
s s’
reach

a 3’
1. Start with a from the previous iteration
2. Take action a from state s
3. Observe r and s’
4. Recall Q(s’,a’) for all a’ available from s’
SARSA update

Q(s’,.)
A

a’ a’ is selected using the

a choice rule
s s’

1. Start with a from the previous iteration

2. Take action a from state s
3. Observe r and s’
4. Recall Q(s’,a’) for all a’ available from s’
5. Select a’ using choice rule on Q
6. Update Q(s,a)

Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
01 - Practical Guide To Ergonomics in Industrial Design
No ratings yet
01 - Practical Guide To Ergonomics in Industrial Design
65 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
86 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
2023 GP Mathematics Literacy P2 June Memo
No ratings yet
2023 GP Mathematics Literacy P2 June Memo
6 pages
Fai Mid2 4ans
No ratings yet
Fai Mid2 4ans
4 pages
JavaScript Developer I Demo
No ratings yet
JavaScript Developer I Demo
5 pages
Class IX Session 2023-24 Subject - Science Sample Question Paper - 3
No ratings yet
Class IX Session 2023-24 Subject - Science Sample Question Paper - 3
6 pages
q3 Performance Task 1
No ratings yet
q3 Performance Task 1
4 pages
Lecture Notes On Reinforcement Learning Basics
No ratings yet
Lecture Notes On Reinforcement Learning Basics
6 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Introduction To Applied Cryptography Syllabus
No ratings yet
Introduction To Applied Cryptography Syllabus
4 pages
Q Learing
No ratings yet
Q Learing
30 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
Human Computer Interaction (COMP4045)
No ratings yet
Human Computer Interaction (COMP4045)
3 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Traffic Engineering Lab Guide IITG
No ratings yet
Traffic Engineering Lab Guide IITG
101 pages
Computer Networks (COMP3015)
No ratings yet
Computer Networks (COMP3015)
3 pages
Unit-5 ML
No ratings yet
Unit-5 ML
18 pages
Unit 5
No ratings yet
Unit 5
70 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
Math Syllabi
No ratings yet
Math Syllabi
8 pages
Unit 5
No ratings yet
Unit 5
54 pages
如何撰写研究假设
100% (1)
如何撰写研究假设
6 pages
Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
Learning Task
No ratings yet
Learning Task
14 pages
ED-Course Plan 2024 EEE
No ratings yet
ED-Course Plan 2024 EEE
6 pages
General Guidelines and Procedures Protocolo de Interpretacion
No ratings yet
General Guidelines and Procedures Protocolo de Interpretacion
12 pages
DPP 1 Quadratic Equation by Om Sir
No ratings yet
DPP 1 Quadratic Equation by Om Sir
5 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Admission To Foundation Program of Tianjin University 2025
No ratings yet
Admission To Foundation Program of Tianjin University 2025
5 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Unit 5
No ratings yet
Unit 5
45 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Rule-Based Reinforcement Learning Augmented by External Knowledge
No ratings yet
Rule-Based Reinforcement Learning Augmented by External Knowledge
7 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
17 pages
Hota ML ReinforcementLearning
No ratings yet
Hota ML ReinforcementLearning
12 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Factors in Predicting Health Behaviors Lecture
No ratings yet
Factors in Predicting Health Behaviors Lecture
20 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
No ratings yet
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
5 pages
Unit 1
No ratings yet
Unit 1
18 pages
Bootcamp Task FAC
No ratings yet
Bootcamp Task FAC
5 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
PPTfor IIIDefense
No ratings yet
PPTfor IIIDefense
12 pages
Eggspress Y6 Non-Fiction
No ratings yet
Eggspress Y6 Non-Fiction
42 pages
Reinforcement Learning Insights
No ratings yet
Reinforcement Learning Insights
4 pages
Unit 5
No ratings yet
Unit 5
65 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
+3 Final - Programme - 2015
No ratings yet
+3 Final - Programme - 2015
4 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Endsem 2020 I C
No ratings yet
Endsem 2020 I C
4 pages
Metal Granulation Solutions
No ratings yet
Metal Granulation Solutions
1 page
Resume Workshop: A Presentation For The BCA Department
No ratings yet
Resume Workshop: A Presentation For The BCA Department
39 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Journal of Business Research: Jos e Ant Onio Porfírio, Tiago Carrilho, Jos e Augusto Felício, Jacinto Jardim
No ratings yet
Journal of Business Research: Jos e Ant Onio Porfírio, Tiago Carrilho, Jos e Augusto Felício, Jacinto Jardim
10 pages
Lecture 9 Reiforcement Learning
No ratings yet
Lecture 9 Reiforcement Learning
29 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Stats Correlation for Chem Students
No ratings yet
Stats Correlation for Chem Students
50 pages
Report p1
No ratings yet
Report p1
7 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Med TG g02 v2 en Web
No ratings yet
Med TG g02 v2 en Web
88 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
Golden Ratio in Architecture Design
No ratings yet
Golden Ratio in Architecture Design
2 pages
STPM 2022 Results Analysis
No ratings yet
STPM 2022 Results Analysis
18 pages
9922 30210 2 PB
No ratings yet
9922 30210 2 PB
9 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
Porosity and Lithology Determination From Formation Density Log and SNP Sidewall Neutron Porosity Log
No ratings yet
Porosity and Lithology Determination From Formation Density Log and SNP Sidewall Neutron Porosity Log
1 page
LSM6DS3 Datasheet
No ratings yet
LSM6DS3 Datasheet
100 pages
Reinforcement Learning Insights
No ratings yet
Reinforcement Learning Insights
5 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Q Learning
No ratings yet
Q Learning
38 pages
Well Test 3 Course
No ratings yet
Well Test 3 Course
27 pages
Do Cry Over Spilt Milk Possibly You Can Change The Past
No ratings yet
Do Cry Over Spilt Milk Possibly You Can Change The Past
19 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Lec 11
No ratings yet
Lec 11
45 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
Reinforedu
No ratings yet
Reinforedu
46 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
Elementos Basicos Aprendizaje Por Refuerzo
No ratings yet
Elementos Basicos Aprendizaje Por Refuerzo
52 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
AI 11 Reinforcement Learning II
No ratings yet
AI 11 Reinforcement Learning II
35 pages
22 KCV For Teaching
No ratings yet
22 KCV For Teaching
2 pages

Lec 09

Uploaded by

Lec 09

Uploaded by

Reinforcement Learning

• Parameter α controls the learning rate

• Parameter α controls the learning rate

1. Select a using choice rule on Q

1. Select a using choice rule on Q

a 2’ There are many possible

1. Select a using choice rule on Q

• Parameter α controls the learning rate

1. Start with a selected in the previous iteration

1. Start with a selected in the previous iteration

a 2’ There are many possible

a’ a’ is selected using the

1. Start with a from the previous iteration

You might also like