Assignment 10: Reinforcement Learning Prof. B. Ravindran

Uploaded by

aryamanchaturvedi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views4 pages

Assignment 10: Reinforcement Learning Prof. B. Ravindran

Uploaded by

aryamanchaturvedi2001

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Assignment 10

Reinforcement Learning
Prof. B. Ravindran
1. Consider the update equation for SMDP Q-learning:
′ ′
Q(s, a) = Q(s, a) + α[A + Bmaxa′ Q(s , a ) − Q(s, a)]

Which of the following are the correct values of A and B ?

(rk is the reward received at time step k, and γ is the discount factor)

(a) A = rt ; B = γ
(b) A = rt + γrt+1 + ... + γ τ −1 rt+τ ; B = γ τ
(c) A = γ t rt + γ t+1 rt+1 + ... + γ t+τ −1 rt+τ ; B = γ t+τ
(d) A = γ τ −1 rt+τ ; B = γ τ

Sol. (b)
A is the value of total discounted reward accumulated between time step t and time step t + τ .
Discounting starts from the t + 1 step in the recursive formulation. B is the value of the
discount factor after τ time steps = γ τ .
Refer to the lecture on SMDP Q-learning.

2. Consider a SMDP in which the next state and the reward only depend on the previous state
and action i.e P (s′ , τ |s, a) = P (s′ |s, a)P (τ |s, a), R(s, a, τ, s′ ) = R(s, a, s′ ).
If we solve the above SMDP with conventional Q-learning we will end up with the same policy
as solving it with SMDP Q-learning.

(a) yes, because now τ won’t change anything and we end up with same states and action
sequences
(b) no, because τ still depends on the state, action pair and discounting may have a effect on
the final policies.
(c) no, because the next state will still depend on the τ .
(d) yes, because the bellman equation is same for both methods in this case.
Sol. (b)
The bellman equation for SMDP has γ τ factor which will affect the returns and thus affect
the policy.

3. In HAM, what will be the immediate rewards received between two choice states.
(a) Accumulation of immediate rewards of the core MDP obtained between these choice
points.
(b) The return of the next choice state.
(c) The reward of only the next primitive action taken.
(d) Immediate reward is always zero

1
Sol. (a)
The transaction between two choice states may involve going through multiple primitive
states.The reward will be accumulating of all the rewards obtained after taking these primitive
actions.
4. Which of the following is true about Markov and Semi Markov Options?

(a) In a Markov Option the option’s policy depends only on the current state.
(b) In a Semi Markov Option the option’s policy can depend only on the current state.
(c) In a Semi Markov Option, the option’s policy may depend on the history since the exe-
cution of the option began.
(d) A Semi-Markov Option is always a Markov Option but not vice versa.

Sol. (a),(b),(c)
In Semi-Markov options, the policy π depends on all the states since the option started
5. Consider the two statements below for an SMDP for a HAM:

Statement1: The state of the SMDP is defined by the state of the base MDP, the call stack
and the state of the machine currently executing.

Statement2: The actions of the SMDP can only be defined by the action states.

Which of the following are true?

(a) Statement1 is True and Statement2 is True.

(b) Statement1 is True and Statement2 is False.
(c) Statement1 is False and Statement2 is True.
(d) Statement1 is False and Statement2 is False.

Sol. (b)
The actions are defined by the states that can be transitioned to from each choice state.
6. Which of the following are possible advantages of formulating a given problem as a hierarchy
of sub-problems?
(a) A reduced state space.
(b) More meaningful state-abstraction.
(c) Temporal abstraction of behaviour.
(d) Re-usability of learnt sub-problems.
Sol. (a),(b),(c),(d)
(a) and (b) are true. By solving sub-problems independent of the overall-problem, we can
reduce the size of the state space, and use more meaningful state representations - encapsulating
only that information required to solve each sub-problem. (c) is true. We abstract away the
notion of time, as we deal with multiple sub-problems, each of which could take a varying
amount of time.
(d) is true. We can reuse the policies learnt for repeated sub-problems in the hierarchy.

2
7. In SMDP, consider the case when τ is fixed for all state, action pairs. Will we always get the
same policy for conventional Q-learning and SMDP Q learning then? Provide answer for the
three cases when τ = 3, τ = 2, τ = 1.
(a) yes, yes, no
(b) no, no, no
(c) yes, yes, yes,
(d) no, no, yes
Sol. (d)
For τ ̸= 1 then the discounting changes and thus the policy.
8. State True or False:
In the classical options framework, each option has a non-zero probability of terminating in
any state of the environment.
(a) True
(b) False
Sol. (b)
In the classical options framework, each option assigns a probability with which it will termi-
nate to every state of the environment. However, the probability of termination can be zero
for any number of these states.
9. Suppose that we model a robot in a room as an SMDP, such that the position of the robot in
the room is the state of the SMDP. Which of the following scenarios satisfy the assumption
that the next state and transition time are independent of each other given the current state
and action i.e P (s′ , τ |s, a) = P (s′ |s, a)P (τ |s, a) ? (Assume that primitive actions - <left, right,
up, down> take a single time step to execute.)
(a) The room has a single door. The actions available are : {exit the room, move left, move
right, move up, move down}.
(b) The room has a two doors. The actions available are : {exit the room, move left, move
right, move up, move down}.
(c) The room has a two doors. The actions available are: {move left, move right, move up,
move down}.
(d) None of the above.
Sol. (a),(c)
The assumption holds for (a). There is only one way to exit the room, so the transition time
taken to exit the room is independent of the next state, provided the current state and action.
The assumption does not hold for (b). The transition time taken to exit the room depends on
which door the robot uses (that is, it depends on the next state).
The assumption holds for (c). Only primitive actions available, each primitive action has a
transition time of a single time step.
10. Which of the following is a correct Bellman equation for an SMDP?
Note: R(s, a, s′ ) =⇒ reward is a function of only s, a and s′ .
(a) V ∗ (s) = maxa∈A(S) [R(s, a, τ, s′ ) + γ τ P (s′ |s, a)V ∗ (s′ )]

3
(b) V ∗ (s) = maxa∈A(S) [Σs′ ,τ P (s′ |s, a, τ )(R(s, a, τ, s′ ) + γV ∗ (s′ ))]
(c) V ∗ (s) = maxa∈A(S) [Σs′ ,τ P (s′ , τ |s, a)(R(s, a, τ, s′ ) + γ τ V ∗ (s′ ))]
(d) V ∗ (s) = maxa∈A(S) [Σs′ ,τ P (s′ , τ |s, a)(R(s, a, s′ ) + γV ∗ (s′ ))]
Sol. (c)
Reward depend on s, a, s′ , τ . We reach next state after τ time so we discount by γ τ . Refer
”Recent advances in hierarchical reinforcement learning” paper for more information.

Reinforcement Learning - Unit 13 - Week 10
No ratings yet
Reinforcement Learning - Unit 13 - Week 10
3 pages
RL Solution3
No ratings yet
RL Solution3
4 pages
Reinforcement Learning Note
No ratings yet
Reinforcement Learning Note
16 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
43 pages
Assignment 4
No ratings yet
Assignment 4
6 pages
Reinforcement Learning Assignment
No ratings yet
Reinforcement Learning Assignment
4 pages
Ta Lecture2
No ratings yet
Ta Lecture2
26 pages
Reinforcement Learning Exam
No ratings yet
Reinforcement Learning Exam
6 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
Solution 3
No ratings yet
Solution 3
4 pages
q2B Review
No ratings yet
q2B Review
9 pages
CSE2530 Reinforcement Learning 2025 P1+2
No ratings yet
CSE2530 Reinforcement Learning 2025 P1+2
115 pages
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
0% (1)
AI 3000 / CS5500: Reinforcement Learning Exam 1: Instructions
4 pages
DL Unit 6 QP Solution
No ratings yet
DL Unit 6 QP Solution
15 pages
Module 5-1
No ratings yet
Module 5-1
12 pages
q2B Review Sol
No ratings yet
q2B Review Sol
14 pages
RL-solution 4
No ratings yet
RL-solution 4
4 pages
Assignment 5
100% (1)
Assignment 5
2 pages
Reinforcement Learning Quiz
No ratings yet
Reinforcement Learning Quiz
2 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
HGTFHGFHTF
No ratings yet
HGTFHGFHTF
5 pages
Assignment 4: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Assignment 4: Reinforcement Learning Prof. B. Ravindran
4 pages
A Crash Course On Reinforcement Learning - Felix Wagner
No ratings yet
A Crash Course On Reinforcement Learning - Felix Wagner
84 pages
DSA5102 Lecture11
No ratings yet
DSA5102 Lecture11
44 pages
RL Assignment
No ratings yet
RL Assignment
2 pages
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
No ratings yet
DRL #4-5 - Introducing MDP and Dynamic Programming Solution
74 pages
Notes For Module 4 and 5
No ratings yet
Notes For Module 4 and 5
9 pages
Quiz2 Sol
No ratings yet
Quiz2 Sol
4 pages
Reinf 2
No ratings yet
Reinf 2
4 pages
242 Sheet 02 02
No ratings yet
242 Sheet 02 02
6 pages
Deep RL - Content Beyond Syllabus
No ratings yet
Deep RL - Content Beyond Syllabus
16 pages
Reinforcement Learning and Deep Learning
No ratings yet
Reinforcement Learning and Deep Learning
15 pages
Lec17 ReinforcementLearning
No ratings yet
Lec17 ReinforcementLearning
58 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Subtitle
No ratings yet
Subtitle
2 pages
Conjugate Markov Decision Processes
No ratings yet
Conjugate Markov Decision Processes
8 pages
Reinforcement Learning: Karan Kathpalia
No ratings yet
Reinforcement Learning: Karan Kathpalia
80 pages
A12 Spring2024
No ratings yet
A12 Spring2024
5 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
RL Cheatsheet for Researchers
No ratings yet
RL Cheatsheet for Researchers
16 pages
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
No ratings yet
Tutorial Questions (Annexure I) Que S-Tion No Questions Co BTL
6 pages
II - Cse-Aiml Fai Objective QB Mid-2 - Key
No ratings yet
II - Cse-Aiml Fai Objective QB Mid-2 - Key
21 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Bits
No ratings yet
Bits
5 pages
DLMAIRIL01 Q4-2024 Session4
No ratings yet
DLMAIRIL01 Q4-2024 Session4
80 pages
POMDP and MDP Tutorial Guide
No ratings yet
POMDP and MDP Tutorial Guide
55 pages
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
No ratings yet
Practice Assignment 6: Reinforcement Learning Prof. B. Ravindran
24 pages
RL Unit - Ii
No ratings yet
RL Unit - Ii
20 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
62 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
AI Decision Making & RL Guide
No ratings yet
AI Decision Making & RL Guide
18 pages
Reinforcement Learning - Unit 6 - Week 4
0% (1)
Reinforcement Learning - Unit 6 - Week 4
3 pages
DAC: The Double Actor-Critic Architecture For Learning Options
No ratings yet
DAC: The Double Actor-Critic Architecture For Learning Options
15 pages
RL - Exam2023 Solved
No ratings yet
RL - Exam2023 Solved
6 pages
Finite Markov Decision Processes-BR
No ratings yet
Finite Markov Decision Processes-BR
31 pages
RL Ese
No ratings yet
RL Ese
7 pages
4, 56, 78 MCQ RL
No ratings yet
4, 56, 78 MCQ RL
16 pages
Genetic Algorithms
No ratings yet
Genetic Algorithms
15 pages
Real World Presentation RA2311050010060 AswathAS DAA
No ratings yet
Real World Presentation RA2311050010060 AswathAS DAA
7 pages
Practice Task 1
No ratings yet
Practice Task 1
5 pages
Blockchain Exam Prep Guide
No ratings yet
Blockchain Exam Prep Guide
2 pages
Chapter Twoo Simplex
100% (2)
Chapter Twoo Simplex
34 pages
Linear Programming Solutions
No ratings yet
Linear Programming Solutions
15 pages
NS Assignment
No ratings yet
NS Assignment
9 pages
Ada Sorting
No ratings yet
Ada Sorting
74 pages
Triangular Factorization and Inversion by Fast Matrix Multiplication
No ratings yet
Triangular Factorization and Inversion by Fast Matrix Multiplication
6 pages
Ai End Term
No ratings yet
Ai End Term
3 pages
FIR Filter Design and Analysis
No ratings yet
FIR Filter Design and Analysis
74 pages
ANN Question Paper 2022
No ratings yet
ANN Question Paper 2022
4 pages
Lecture 5 Discrete-Time Convolution
No ratings yet
Lecture 5 Discrete-Time Convolution
92 pages
MATLAB FDM Program for Signal Recovery
No ratings yet
MATLAB FDM Program for Signal Recovery
8 pages
A Design of LDPC Codes With Large Girth Based On The Sub-Matrix Shifting
No ratings yet
A Design of LDPC Codes With Large Girth Based On The Sub-Matrix Shifting
4 pages
Hashing
No ratings yet
Hashing
9 pages
DMISJBU Question Bank - 351CS63-CG
No ratings yet
DMISJBU Question Bank - 351CS63-CG
4 pages
DSA Tutorial - Learn Data Structures and Algorithms
No ratings yet
DSA Tutorial - Learn Data Structures and Algorithms
6 pages
Unit 4 Transportation Problem: Structure
100% (1)
Unit 4 Transportation Problem: Structure
29 pages
Finite Element Analysis Notes
No ratings yet
Finite Element Analysis Notes
2 pages
Binary Search & String Algorithms
No ratings yet
Binary Search & String Algorithms
22 pages
Unsupervised Learning: K-Means & GMM
No ratings yet
Unsupervised Learning: K-Means & GMM
27 pages
Navier-Stokes Solution Guide
No ratings yet
Navier-Stokes Solution Guide
5 pages
ViDeNN Deep Blind Video Denoising
No ratings yet
ViDeNN Deep Blind Video Denoising
10 pages
CSF211 Data Structures and Algorithms II Sem 2024-25 Handout 2
No ratings yet
CSF211 Data Structures and Algorithms II Sem 2024-25 Handout 2
4 pages
DTFT Vs DFT
No ratings yet
DTFT Vs DFT
3 pages
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
No ratings yet
Solving Transportation Problem Using Vogel's Approximation Method, Stepping Stone Method & Modified Distribution Method
38 pages
3.2 Data Approximation and Neville's Method: B K N X
No ratings yet
3.2 Data Approximation and Neville's Method: B K N X
7 pages
Convolutional Neural Network Case Studies: (1) Anomalies in Mortality Rates (2) Image Recognition
No ratings yet
Convolutional Neural Network Case Studies: (1) Anomalies in Mortality Rates (2) Image Recognition
24 pages
DSP: Interpolation & Decimation
No ratings yet
DSP: Interpolation & Decimation
32 pages

Assignment 10: Reinforcement Learning Prof. B. Ravindran

Uploaded by

Assignment 10: Reinforcement Learning Prof. B. Ravindran

Uploaded by

Assignment 10

Which of the following are the correct values of A and B ?

Which of the following are true?

(a) Statement1 is True and Statement2 is True.

You might also like