Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
23 views14 pages

Day 2 MDPs and Value Functions

The document outlines a training session on Markov Decision Processes (MDPs) and value functions in reinforcement learning. It covers key concepts such as policies, value functions, and includes activities for calculating state values and setting up a hand simulation project. The session also previews future topics like solving MDPs and implementing solutions in code.

Uploaded by

riti2529
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views14 pages

Day 2 MDPs and Value Functions

The document outlines a training session on Markov Decision Processes (MDPs) and value functions in reinforcement learning. It covers key concepts such as policies, value functions, and includes activities for calculating state values and setting up a hand simulation project. The session also previews future topics like solving MDPs and implementing solutions in code.

Uploaded by

riti2529
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Day 2: MDPs & Value Functions

• Understanding the building blocks of


Reinforcement Learning
• Your Name
• Date
Agenda
• 1. Markov Decision Processes (MDPs)
• 2. Policies
• 3. Value Functions
• 4. Activity: Compute V(s)
• 5. Project: Set up Hand Simulation
What is an MDP?
• Markov Decision Process = (S, A, P, R, γ)
• S: Set of states
• A: Set of actions
• P(s' | s, a): Transition probabilities
• R(s, a): Reward function
• γ: Discount factor
Markov Property
• The future is independent of the past given
the present
• Formally:
• P(s_{t+1} | s_t, a_t, ..., s_0, a_0) = P(s_{t+1} |
s_t, a_t)
Policy (π)
• Policy: Mapping from states to actions
• Deterministic: π(s) = a
• Stochastic: π(a | s) = P(a | s)
• Goal of RL: Find optimal policy π* that
maximizes expected reward
Value Functions
• State-Value Function V^π(s):
• Expected return from state s following π:
• V^π(s) = E_π [∑ γ^t R(s_t, a_t)]
• Action-Value Function Q^π(s, a):
• Expected return from s, taking action a, then
following π
Bellman Equations
• Value Function (Recursive Form):
• V^π(s) = ∑ π(a | s) ∑ P(s' | s, a) [R(s, a) + γ
V^π(s')]
Visual: Simple MDP Example
• (Include a simple MDP diagram or describe
the structure verbally)
Activity: Calculate V(s)
• Given:
• - 3 States: S1, S2, S3
• - Actions: A1, A2
• - Rewards and transitions shown in
diagram/table
• Task: Compute V(s) for each state under a
fixed policy
Python Activity Preview
• Implement a simple MDP:
• - Define states/actions
• - Define transition matrix and reward table
• - Evaluate a policy using Bellman equation
• Tools: Python, NumPy
Project: Hand Simulation Setup
• Goal: Simulate a hand (e.g., robotic hand, card
game)
• Today:
• - Define state space (e.g., hand positions, grip
strength)
• - Define actions (e.g., open, close, flex)
• - Define reward (e.g., holding object = +1, drop
= -1)
Group Work Prompt
• In teams, sketch out hand sim scenario
• Identify:
• - State space
• - Action space
• - Rewards
• - Transition rules
• Present briefly at the end
What’s Next (Day 3 Preview)
• Solving MDPs:
• - Policy Evaluation
• - Policy Iteration
• - Value Iteration
• Implementing solutions in code
Q&A / Wrap-Up
• Questions?
• Recap today’s key points
• Check-in: Do students feel confident in MDPs
and value functions?

You might also like