Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
80 views3 pages

Introduction To Machine Learning - Unit 15 - Week 12

The document outlines an assignment for the NPTEL course 'Introduction to Machine Learning,' detailing the submission status and scores for various questions related to probability, hypothesis functions, and reinforcement learning. It includes specific questions and accepted answers, along with feedback on correctness. The assignment was submitted on April 16, 2025, and the document also provides information about the course structure and learning objectives.

Uploaded by

Vardhini Kondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views3 pages

Introduction To Machine Learning - Unit 15 - Week 12

The document outlines an assignment for the NPTEL course 'Introduction to Machine Learning,' detailing the submission status and scores for various questions related to probability, hypothesis functions, and reinforcement learning. It includes specific questions and accepted answers, along with feedback on correctness. The assignment was submitted on April 16, 2025, and the document also provides information about the course structure and learning objectives.

Uploaded by

Vardhini Kondra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

X

(https://swayam.gov.in) (https://swayam.gov.in/nc_details/NPTEL)

[email protected]

NPTEL (https://swayam.gov.in/explorer?ncCode=NPTEL) » Introduction to Machine Learning (course)


Click to register for
Certification exam

Week 12 : Assignment 12
(https://examform.nptel.ac.in/2025_01/exam_form/dashboard)

If already registered, click The due date for submitting this assignment has passed.

to check your payment Due on 2025-04-16, 23:59 IST.


status
Assignment submitted on 2025-04-16, 11:37 IST
1) 4
2 points
Let P (Ai ) = 2
−i
. Calculate the upper bound for P ( ⋃ Ai ) using union bound (rounded to 3 decimal places).
Course outline i=1

0.875
About NPTEL ()
0.937
How does an NPTEL 0.984
online course work? () 1

Yes, the answer is correct.


Week 0 () Score: 2
Accepted Answers:
Week 1 () 0.937

Week 2 () 2) Given 50 hypothesis functions, each trained with 105 samples, what is the lower bound on the probability that there does not exist 2 points
a hypothesis function with error greater than 0.1?
Week 3 ()
3
−2⋅10
Week 4 () 1 − 100

3
−10
1 − 100e
Week 5 ()
3
−2⋅10
1 − 50
Week 6 ()
3
−10
1 − 50
Week 7 ()
No, the answer is incorrect.
Score: 0
Week 8 ()
Accepted Answers:
3
−2⋅10
1 − 100
Week 9 ()

3) The VC dimension of a pair of squares is: 1 point


Week 10 ()
3
Week 11 () 4
5
Week 12 ()
6
Learning Theory (unit?
No, the answer is incorrect.
unit=129&lesson=130) Score: 0
Introduction to
Accepted Answers:
Reinforcement Learning
5
(unit?
unit=129&lesson=131) 4) In games like Chess or Ludo, the transition function is known to us. But what about Counter Strike or Mortal Combat or Super 1 point
Mario? In games where we do not know T, we can only query the game simulator with current state and action, and it returns the next state.
RL Framework and TD
This means we cannot directly argmax or argmin for V(T(S,a)). Therefore, learning the value function V is not sufficient to construct a policy.
Learning (Optional) (unit?
Which of these could we do to overcome this? (more than 1 may apply)
unit=129&lesson=132)
Assume there exists a method to do each option. You have to judge whether doing it solves the stated problem.
Solution Methods and
Applications (Optional) Directly learn the policy.
(unit?
Learn a different function which stores value for state-action pairs (instead of only state like V does).
unit=129&lesson=133)
Learn T along with V.
Week 12 Feedback
Run a random agent repeatedly till it wins. Use this as the winning policy.
Form:Introduction to
Machine Learning!! (unit? Yes, the answer is correct.
unit=129&lesson=293) Score: 1
Accepted Answers:
Quiz: Week 12 : Directly learn the policy.
Assignment 12 Learn a different function which stores value for state-action pairs (instead of only state like V does).
(assessment?name=319)
Learn T along with V.

Text Transcripts ()
For the rest of the questions, we will follow a simplistic game and see how a Reinforcement Learning agent can learn to behave opt

This is our game:


Download Videos ()

Books ()

Problem Solving
Session - Jan 2025 ()
At the start of the game, the agent is on the Start state and can choose to move left or right at each turn.

If it reaches the right end RE, it wins and if it reaches the left end LE, it loses.

Because we love maths so much, instead of saying the agent wins or loses,

we will say that the agent gets a reward of +1 at RE and a reward of -1 at LE.

Then the objective of the agent is simply to maximum the reward it obtains!

5) For each state, we define a variable that will store its value. The value of the state will help the agent determine how to behave 1 point
later. First we will learn this value.

Let V be the mapping from state to its value.


Initially,
V(LE) = -1
V(X1) = V(X2) = V(X3) = V(X4) = V(Start) = 0
V(RE) = +1

For each state S ∈ {X1, X2, X3, X4, Start} , with SL being the state to its immediate left and SR being the state to its immediate right,
repeat:
V (S) = 0.9 × max(V (SL ), V (SR ))

Till V converges (does not change for any state).

What is V(X4) after one application of the given formula?

1
0.9
0.81
0

Yes, the answer is correct.


Score: 1
Accepted Answers:
0.9

6) What is V(X1) after one application of given formula? 1 point

-1
-0.9
-0.81
0

Yes, the answer is correct.


Score: 1
Accepted Answers:
0

7) What is V(X1) after V converges? 1 point

0.54
-0.9
0.63
0

No, the answer is incorrect.


Score: 0
Accepted Answers:
0.54

8) The behavior of an agent is called a policy. Formally, a policy is a mapping from states to actions. In our case, we have two 1 point
actions: left and right. We will denote the action for our policy as A.
Clearly, the optimal policy would be to choose action right in every state. Which of the following can we use to mathematically describe our
optimal policy using the learnt V?

For options (c) and (d), T is the transition function defined as: T (state, action) = next_state . (more than one options may apply)
Left if V (SL ) > V (SR )
A = {
Right otherwise

Left if V (SR ) > V (SL )


A = {
Right otherwise

A = arg max({V (T (S, a))})


a

A = arg min({V (T (S, a))})


a

No, the answer is incorrect.


Score: 0
Accepted Answers:
Left if V (SL ) > V (SR )
A = {
Right otherwise

A = arg max({V (T (S, a))})


a

You might also like