H.T.
No: Course Code: 201AM7E04
ADITYA ENGINEERING COLLEGE (A)
REINFORCEMENT LEARNING
(Artificial Intelligence and Machine Learning)
Time: 3 hours Max. Marks: 70
Answer ONE question from each unit
All Questions Carry Equal Marks
All parts of the questions must be answered at one place only
UNIT – I
1 a Define Reinforcement Learning? Explain with various examples L2 CO1 [7M]
b Explain about a k-armed Bandit problem with an example L2 CO1 [7M]
OR
2 a Explain optimistic initial values and explain gradient bandit algorithm L2 CO1 [7M]
with example.
b Explain incremental implementation. Explain about tracking a non L2 CO1 [7M]
stationary problem.
UNIT – II
3 a Discuss about the Agent – Environment Interface with examples. L2 CO2 [7M]
b Discuss about various Goals and Rewards with examples. L2 CO2 [7M]
OR
4 a Define Dynamic Programming. Explain about Policy Evaluation. L2 CO2 [7M]
b Explain about Value Iteration, Asynchronous Dynamic Programming. L2 CO2 [7M]
UNIT – III
5 a Define Monte Carlo Prediction. Explain about Monte Carlo Estimation of L2 CO3 [7M]
Action Values with examples.
b Explain about Monte Carlo Control and Monte Carlo Control without L2 CO3 [7M]
Exploring Starts with examples.
OR
6 a Explain a Unifying Algorithm: n – step with an example L4 CO3 [7M]
b Explain about Discontinuing – aware importance Sampling L2 CO3 [7M]
withexamples.
UNIT – IV
7 a Explain with examples about Off – Policy Divergence. L2 CO4 [7M]
b Define Semi – gradient Methods and the Deadly Triad withexamples. L2 CO4 [7M]
OR
8 a Explain about the Bellman Error is not learnable. L2 CO4 [7M]
b Explain about Dutch Traces in i) Monte Carlo Learning ii) Variables with L2 CO4 [7M]
examples.
UNIT – V
9 a Explain Policy Approximation and its advantages. L2 CO5 [7M]
b Explain about the Policy Gradient Theorem L3 CO5 [7M]
OR
10 a Explain about Reinforce – Monte Carlo Policy Gradient withexample. L2 CO5 [7M]
b Discuss about Watson’s Daily Double – Wagering and optimizing L2 CO5 [7M]
Memory Control with examples
*****