Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
0 views1 page

Reinforcement Learning Model Paper

Uploaded by

nitla kolukula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views1 page

Reinforcement Learning Model Paper

Uploaded by

nitla kolukula
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

H.T.

No: Course Code: 201AM7E04

ADITYA ENGINEERING COLLEGE (A)


REINFORCEMENT LEARNING
(Artificial Intelligence and Machine Learning)
Time: 3 hours Max. Marks: 70
Answer ONE question from each unit
All Questions Carry Equal Marks
All parts of the questions must be answered at one place only

UNIT – I
1 a Define Reinforcement Learning? Explain with various examples L2 CO1 [7M]
b Explain about a k-armed Bandit problem with an example L2 CO1 [7M]
OR
2 a Explain optimistic initial values and explain gradient bandit algorithm L2 CO1 [7M]
with example.
b Explain incremental implementation. Explain about tracking a non L2 CO1 [7M]
stationary problem.

UNIT – II
3 a Discuss about the Agent – Environment Interface with examples. L2 CO2 [7M]
b Discuss about various Goals and Rewards with examples. L2 CO2 [7M]
OR
4 a Define Dynamic Programming. Explain about Policy Evaluation. L2 CO2 [7M]
b Explain about Value Iteration, Asynchronous Dynamic Programming. L2 CO2 [7M]

UNIT – III
5 a Define Monte Carlo Prediction. Explain about Monte Carlo Estimation of L2 CO3 [7M]
Action Values with examples.
b Explain about Monte Carlo Control and Monte Carlo Control without L2 CO3 [7M]
Exploring Starts with examples.
OR
6 a Explain a Unifying Algorithm: n – step with an example L4 CO3 [7M]
b Explain about Discontinuing – aware importance Sampling L2 CO3 [7M]
withexamples.

UNIT – IV
7 a Explain with examples about Off – Policy Divergence. L2 CO4 [7M]
b Define Semi – gradient Methods and the Deadly Triad withexamples. L2 CO4 [7M]
OR
8 a Explain about the Bellman Error is not learnable. L2 CO4 [7M]
b Explain about Dutch Traces in i) Monte Carlo Learning ii) Variables with L2 CO4 [7M]
examples.

UNIT – V
9 a Explain Policy Approximation and its advantages. L2 CO5 [7M]
b Explain about the Policy Gradient Theorem L3 CO5 [7M]
OR
10 a Explain about Reinforce – Monte Carlo Policy Gradient withexample. L2 CO5 [7M]
b Discuss about Watson’s Daily Double – Wagering and optimizing L2 CO5 [7M]
Memory Control with examples
*****

You might also like