0% found this document useful (0 votes)

20 views42 pages

HRL Lecture

The document discusses Hierarchical Reinforcement Learning (HRL) and the concept of temporal abstraction in decision-making for autonomous robots. It introduces the Options Framework, which formalizes temporally extended actions, and explores various architectures such as Actor-Critic and Option-Critic for learning and planning. Additionally, it covers Feudal Learning and FeUdal Networks as methods for managing subgoals and improving learning efficiency in hierarchical settings.

Uploaded by

ajay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views42 pages

HRL Lecture

Uploaded by

ajay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 42

Hierarchical Reinforcement Learning

Temporal Abstraction in RL

Khimya Khetarpal
Reasoning and Learning Lab
Mila - McGill University
Consider an autonomous robot in a warehouse

2 Image Source: Boston Dynamics Robot Handle

Consider an autonomous robot in a warehouse

Pick up boxes Navigate to destination Stack boxes

Tasks at hand could be solved quickly and efficiently with skills

3 Image Source: Boston Dynamics Robot Handle

Consider an autonomous robot in a warehouse

Pick up boxes Navigate to destination Stack boxes

Scan room Identify Objects Find Box Obstacle Avoid Obstacle Detect Move Identify OBJ Find Box Place Box

Reach Box Find Location Reach Box Find Location

Each skill can take different number of time steps

4 Image Source: Boston Dynamics Robot Handle

Consider an autonomous robot in a warehouse

Pick up boxes Navigate to destination Stack boxes

Scan room Identify Objects Find Box Obstacle Avoid Obstacle Detect Move Identify OBJ Find Box Place Box

Reach Box Find Location Reach Box Find Location

The ability to abstract knowledge temporally over many different

time scales is seamlessly integrated in human decision making!

5 Image Source: Boston Dynamics Robot Handle

Reinforcement Learning

At each time step, the agent:

• Executes action At
• Receives observation Ot
• Receives reward Rt

At each time step, the environment:

• Receives action
• Emits observation Ot+1
• Emits scalar reward Rt+1

6
Predictions : Value Functions

Policy π(a | s)

Value Function Vπ(s) = Eπ[Rt+1 + γRt+2 + γ 2Rt+3 + . . . | St = s]

Immediate Reward Discount Factor Discounted Future Value

7
Learning Values

Policy π(a | s)

Value Function Vπ(s) = Eπ[Rt+1 + γRt+2 + γ 2Rt+3 + . . . | St = s]

Immediate Reward Discount Factor Discounted Future Value

Temporal Difference Learning

V(St) ← V(St) + α(Rt+1 + γV(St+1) − V(St))

Temporal-difference error: δt = Rt+1 + γV(St+1) − V(St)

Learning rule for parameterized value functions wt+1 = wt + αδt ∇w Vw(St)

8
Why Temporal Abstraction
Planning
• Generate shorter plans
• Provides robustness to model errors
• Improves sample complexity

Learning
• Improve exploration by taking shortcuts in the environment
• Facilitates Off-Policy learning
• Improves efficiency/learning speed
• Helps in transfer learning

9
The Options Framework

10
The Options Framework
Options (Sutton, Precup, and Singh, 1999) formalize the idea of temporally extended
actions also known as skills.

11 Sutton, Precup & Singh 1999

Options Framework
• Definition
Let S, A be the set of states and actions. A Markov option ω ∈ Ω is a triple:

(Iω ⊆ S , πω : S × A → [0, 1] , βω : S → [0, 1])

Initiation set Intra option policy Termination condition

• Iω set of states aka preconditions

• πω(s, a) probability of taking an action a ∈ A in state s ∈ S when following the option ω
• βω(s) probability of terminating option ω upon entering state s
with a policy over options πΩ : S × Ω → [0,1]

12 Sutton, Precup & Singh 1999

Options Framework
• Definition
Let S, A be the set of states and actions. A Markov option ω ∈ Ω is a triple:

(Iω ⊆ S , πω : S × A → [0, 1] , βω : S → [0, 1])

Initiation set Intra option policy Termination condition

• Iω set of states aka preconditions

• Example
• Robot navigating in a house: when you come across a closed door ( Iω ), open the
door ( πω ), until the door has been opened ( βω )

13 Sutton, Precup & Singh 1999

Planning with Options

14 Sutton, Precup & Singh 1999

Planning with Options

Primitive actions

Initial Values Iteration #1 Iteration #2

15 Sutton, Precup & Singh 1999

Planning with Options

Primitive actions

Initial Values Iteration #1 Iteration #2

Hallway Options

Initial Values Iteration #1 Iteration #2

16 Sutton, Precup & Singh 1999

Planning with Options : Discussion

Potential Applications:
• Planning with stocks
• Planning with assets - asset management
• Clinical Domains [Y. Shahar: A framework for knowledge-based temporal abstraction]

17
Can we learn such temporal abstractions?

• Bacon, Harb, and Precup, 2017 proposed the option-critic framework which provides
the ability to learn a set of options

• Optimize directly the discounted return, averaged over all the trajectories starting at a
designated state and option

∞ t
J= EΩ,θ,ω[ ∑t=0 γ rt+1 | s0, ω0]

18 Bacon, Harb & Precup 2017

Actor-Critic Architecture

19
Actor-Critic Architecture

Decides how the agent acts

Provides feedback to improve the actor

20
Option-Critic Architecture

Parameterize internal policies

Parameterize termination conditions

21 Bacon, Harb & Precup 2017

Option-Critic with Deep RL

22 Bacon, Harb & Precup 2017

Option-Critic with Deep RL

23 Bacon, Harb & Precup 2017

Hierarchical Abstract Machines (HAMs)
Hierarchical Abstract Machines (HAMs)
• Key Idea:
• Non deterministic finite state machines
• Transitions invoke lower level machines

Parr & Russell, 1998

Hierarchical Abstract Machines (HAMs)
• Key Idea:
• Non deterministic finite state machines
• Transitions invoke lower level machines

• A Machine:
• Is a partial policy
• Has four states: Call/Stop/Choice/Action State

Parr & Russell, 1998

Hierarchical Abstract Machines (HAMs)
• Upon encountering an obstacle:
• Machine enters a Choice state
• Follow-wall Machine
• Back-off Machine

• A HAM learns a policy to decide which machine is optimal to call

Parr & Russell, 1998

Feudal Learning

28
Feudal Learning

29 Dayan & Hinton 1993

Feudal Learning
• Reward Hiding:
• The managers provide subtasks g for sub-managers
• Managers only reward the actions if the sub-manager achieves g,
irrespective of what the overall goal of the task is.
• Low-level managers learn how to achieve low-level goals even if these
do not exactly correspond together to the highest level goal.

Dayan & Hinton 1993

• Information Hiding:
• Managers only know the state of the system at the granularity of their
own choices of tasks.
• Information is hidden both ways, upwards and downwards, in terms of
the choice of sub-tasks chosen to meet the main goal.
• Managers only reward the actions if the sub-manager achieves g,
irrespective of what the overall goal of the task is.

Dayan & Hinton 1993

FeUdal Networks (FUN) for HRL

Vezhnevets et. al 2017

FeUdal Networks (FUN) for HRL
• Key Insights:
• Manager chooses a subgoal direction that maximizes reward
• Worker selects actions that maxim cosine similarity
• FuN aims to represent sub-goals as directions in latent state space
• Subgoals = Meaning behaviours ; Subgoals as actions

Vezhnevets et. al 2017

FeUdal Networks (FUN) for HRL

Vezhnevets et. al 2017

Moving towards truly scalable RL

"Stop learning tasks, start learning skills." - Satinder Singh, NeurIPS 2018
Related Literature

• MAXQ
• HIRO
• h-DQN
• Meta Learning with Shared Hierarchies
• To be completed
Demo
Questions
Extra Slides
Option-Critic
Formulation

All options are available in all states

The option value function is defined as

∑
QΩ(s, ω) = πω,θ(a | s)QU(s, ω, a)
a

where QU : S × Ω × A ℝ is the value of executing an action in the context of a

state-option pair defined as:

∑
QU(s, ω, a) = r(s, a) + γ P(s′| s, a)U(ω, s′)
s′

41
Option-Critic
Formulation

All options are available in all states

The option value function is defined as

∑
QΩ(s, ω) = πω,θ(a | s)QU(s, ω, a)
a

where QU : S × Ω × A ℝ is the value of executing an action in the context of a

state-option pair defined as:

∑
QU(s, ω, a) = r(s, a) + γ P(s′| s, a)U(ω, s′)
s′

where U : S × Ω ℝ is the option-value function upon arrival in a state:

U(ω, s′) = (1 − βω,ν(s′))QΩ(s′, ω) + βω,ν(s′)VΩ(s′)

AILabPress Ersin
No ratings yet
AILabPress Ersin
18 pages
CSE4037 Reinforcement Learning: Options
No ratings yet
CSE4037 Reinforcement Learning: Options
17 pages
Module6 4 Options
No ratings yet
Module6 4 Options
17 pages
MODULE6 1 Hierarchical Reinforcement Learning
No ratings yet
MODULE6 1 Hierarchical Reinforcement Learning
26 pages
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
No ratings yet
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
14 pages
Opinion Critic
No ratings yet
Opinion Critic
9 pages
Unit 02 REL
No ratings yet
Unit 02 REL
127 pages
10 ReinforcementLearning
No ratings yet
10 ReinforcementLearning
59 pages
RL - 01 Introduction To Reinforcement Learning
No ratings yet
RL - 01 Introduction To Reinforcement Learning
62 pages
Deep Reinforcement Learning: Lecture Notes
No ratings yet
Deep Reinforcement Learning: Lecture Notes
60 pages
01 RL Fundamentals - Complete Beginner's Guide
No ratings yet
01 RL Fundamentals - Complete Beginner's Guide
22 pages
Reinforcement Learning: Nguyen Do Van, PHD
No ratings yet
Reinforcement Learning: Nguyen Do Van, PHD
40 pages
Reinforcement Learning and Dynamic Programming For Control
100% (1)
Reinforcement Learning and Dynamic Programming For Control
111 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
MODULE6 5 Learning With Options
No ratings yet
MODULE6 5 Learning With Options
19 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
35 pages
Hierarchical Reinforcement Learning Guide
No ratings yet
Hierarchical Reinforcement Learning Guide
28 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Artificial Intelligence 2
No ratings yet
Artificial Intelligence 2
5 pages
tiếng anhi
No ratings yet
tiếng anhi
7 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
Module 3
No ratings yet
Module 3
57 pages
Sections
No ratings yet
Sections
76 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
CSE4037 Reinforcement Learning
No ratings yet
CSE4037 Reinforcement Learning
19 pages
Bacon 2018 Thesis
No ratings yet
Bacon 2018 Thesis
157 pages
03 04 Lessonarticle
No ratings yet
03 04 Lessonarticle
5 pages
AI Learning for Advanced Users
No ratings yet
AI Learning for Advanced Users
12 pages
Unit 4
No ratings yet
Unit 4
23 pages
Reinforcement Learning-1
No ratings yet
Reinforcement Learning-1
13 pages
16 RL PDF
No ratings yet
16 RL PDF
87 pages
Unit 5 Deep Learning
No ratings yet
Unit 5 Deep Learning
24 pages
L12 Reinforcement Learning 2
No ratings yet
L12 Reinforcement Learning 2
26 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
48 pages
Reinforcement Learning Model Based Planning Dynamic Programming
No ratings yet
Reinforcement Learning Model Based Planning Dynamic Programming
17 pages
F90de-Introduction To Reinforcement Learning
No ratings yet
F90de-Introduction To Reinforcement Learning
67 pages
Maai 6
No ratings yet
Maai 6
143 pages
RL 1
No ratings yet
RL 1
29 pages
Reinforcement Learning & Robotics
No ratings yet
Reinforcement Learning & Robotics
35 pages
Unit 3 Ai
No ratings yet
Unit 3 Ai
5 pages
AI Decision Making & RL Guide
No ratings yet
AI Decision Making & RL Guide
18 pages
Multi-Agent Reinforcement Learning With Asymmetric Representation Assisted by Multi-Objective Evolutionary Algorithms
No ratings yet
Multi-Agent Reinforcement Learning With Asymmetric Representation Assisted by Multi-Objective Evolutionary Algorithms
8 pages
Decision Making
No ratings yet
Decision Making
63 pages
SSRN 4963741
No ratings yet
SSRN 4963741
26 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
8 pages
Introduction To Reinforcement Learning
100% (1)
Introduction To Reinforcement Learning
52 pages
Imp Q and A
No ratings yet
Imp Q and A
27 pages
2024 MTH058 Lecture05 ReinforcementLearning
No ratings yet
2024 MTH058 Lecture05 ReinforcementLearning
59 pages
Lec 08
No ratings yet
Lec 08
59 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Lec 26
No ratings yet
Lec 26
21 pages
Kguh
No ratings yet
Kguh
38 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
CSE 445 - Lecture 9 - Reinforcement Learning
No ratings yet
CSE 445 - Lecture 9 - Reinforcement Learning
45 pages
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
No ratings yet
Approximate Dynamic Programming - II: Algorithms: Warren B. Powell
22 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Machine Learning - K-Nearest Neighbors (KNN)
No ratings yet
Machine Learning - K-Nearest Neighbors (KNN)
3 pages
Dyna-Q - Planning and Learning With Tabular Methods
No ratings yet
Dyna-Q - Planning and Learning With Tabular Methods
13 pages
Proximal Policy Optimization: Development
No ratings yet
Proximal Policy Optimization: Development
5 pages
Motor Speed Measurement Using Arduino - 6 Steps - Instructables
No ratings yet
Motor Speed Measurement Using Arduino - 6 Steps - Instructables
5 pages
Maxq Document
No ratings yet
Maxq Document
19 pages
Double Deep Q Networks
No ratings yet
Double Deep Q Networks
18 pages
Pantry Evaluation Proposal Internship
No ratings yet
Pantry Evaluation Proposal Internship
6 pages
Active Fire Protectiondocx
No ratings yet
Active Fire Protectiondocx
51 pages
Women's Day - Famous Space Women
No ratings yet
Women's Day - Famous Space Women
2 pages
Ambassador SWOT Examples
No ratings yet
Ambassador SWOT Examples
18 pages
SAP MM - Purchase Info Record
100% (1)
SAP MM - Purchase Info Record
6 pages
JD - Commissioning Supervisor
No ratings yet
JD - Commissioning Supervisor
2 pages
Engineer Onboarding Form
No ratings yet
Engineer Onboarding Form
12 pages
IIT BH - DNC Lab - EE - Manual - Expt 7
No ratings yet
IIT BH - DNC Lab - EE - Manual - Expt 7
1 page
Cooperative Law Notes
50% (2)
Cooperative Law Notes
20 pages
Local Media7707301369137256841
No ratings yet
Local Media7707301369137256841
33 pages
Kirin PDF
No ratings yet
Kirin PDF
28 pages
Government of Zimbabwe: Supply and OF Clothing
No ratings yet
Government of Zimbabwe: Supply and OF Clothing
25 pages
Marx's Impact on Class Struggle
No ratings yet
Marx's Impact on Class Struggle
3 pages
Gonna Fly Now Trumpet Cover
No ratings yet
Gonna Fly Now Trumpet Cover
3 pages
RPMS Nornynel
No ratings yet
RPMS Nornynel
116 pages
Katalog GALA - Gate Valve OSNY
No ratings yet
Katalog GALA - Gate Valve OSNY
1 page
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
MIT15 S12F18 Ses3
No ratings yet
MIT15 S12F18 Ses3
24 pages
COP WFP CHK 01 2013 v1 All Checklists
100% (1)
COP WFP CHK 01 2013 v1 All Checklists
47 pages
Understanding Resistance to Change
No ratings yet
Understanding Resistance to Change
19 pages
TR Bro Updated Erl221
No ratings yet
TR Bro Updated Erl221
4 pages
Safety Gondola
No ratings yet
Safety Gondola
14 pages
Grocery Ads Good Through May 4, 2010
No ratings yet
Grocery Ads Good Through May 4, 2010
2 pages
HIST342 Exercise 10
No ratings yet
HIST342 Exercise 10
5 pages
Manas College Pamphlet 2025
No ratings yet
Manas College Pamphlet 2025
2 pages
Bstm20oe201 2ND Sem Sy2024 2025
No ratings yet
Bstm20oe201 2ND Sem Sy2024 2025
1 page
Geotech 1 Lecture 2 Structure
No ratings yet
Geotech 1 Lecture 2 Structure
38 pages
17 Cloud Computing
No ratings yet
17 Cloud Computing
35 pages
Auditing
No ratings yet
Auditing
54 pages
Design of Linear Quadratic Regulator For Rotary Inverted Pendulum Using Labview
No ratings yet
Design of Linear Quadratic Regulator For Rotary Inverted Pendulum Using Labview
5 pages