0% found this document useful (0 votes)

4 views23 pages

Utilities and MDP: A Lesson in Multiagent System: Henry Hexmoor Siuc Siuc

Uploaded by

selvi13082005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views23 pages

Utilities and MDP: A Lesson in Multiagent System: Henry Hexmoor Siuc Siuc

Uploaded by

selvi13082005

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Utilities and MDP:

A Lesson in Multiagent System

Henry Hexmoor
SIUC
Utility
• Preferences are recorded as a utility function
ui : S  R
S is
i the
h set off observable
b bl states in
i the
h world
ld
ui is utility function
R is real numbers
• States of the world become ordered.
ordered
Properties of Utilites
Reflexive: ui(s) ≥ ui(s)
Transitive: If ui(a) ≥ ui(b) and ui(b) ≥ ui(c)
then ui(a) ≥ ui(c).
(c)
Comparable: a,b either ui(a) ≥ ui(b) or
ui(b) ≥ ui(a).
( )
Selfish agents:
• A rational agent is one that wants to maximize
its utilities, but intends no harm.
Agents
Rational agents

Selfish
lf h
agents

Rational, non‐
selfish agents
Utility is not money:
• while utility represents an agent’s
agent s preferences
it is not necessarily equated with money. In
fact the utility of money has been found to be
fact,
roughly logarithmic.
Marginal Utility
• Marginal utility is the utility gained from next
event
Example:
getting A for an A student.
versus A for an B student
Transition function
Transition function is represented as
T ( s, a, s ' )
Transition
i i function
f i isi defined
d fi d as the h probability
b bili
of reaching S’ from S with action ‘a’
Expected Utility
• Expected utility is defined as the sum of
product of the probabilities of reaching s’
from s with action ‘a’a and utility of the final
state.

E[ui , s, a ]   T ( s, a, s ' )ui ( s ' )

s 'S
Where S is set of all p
possible states
Value of Information
• Value of information that current state is t and
not s:

E  E[ui , t ,  i (t )]  E[ui , t ,  i ( s )]

here E [u i , t ,  i (t )] represents updated, new info

E[ui , t ,  i ( s )] represents
p old value
Markov Decision Processes: MDP
• Graphical representation of a sample Markov decision process along with values for
the transition and reward functions. We let the start state be s1. E.g.,
Reward Function: r(s)
• Reward function is represented as
r:S→R
Deterministic Vs Non‐Deterministic
Non Deterministic
• Deterministic world: predictable effects
Example: only one action leads to T=1 , else Ф

• Nondeterministic world: values change

Policy: 
• Policy is behavior of agents that maps states
to action
• Policy is represented by 
Optimal Policy
• Optimal policy is a policy that maximizes
expected utility.
• Optimal policy is represented as 
*

 i* ( s )  arg a A max E [u i , s , a ]
Discounted Rewards:  ( 0  1)
• Discounted rewards smoothly reduce the
impact of rewards that are farther off in the
future
 0 r ( s 1 )   1 r ( s 2 )   2 r ( s 3 )  ...

Wh
Where  ( 0  1) represents
t di
discountt ffactor
t

 * ( s )  arg max  T ( s , a , s )u ( s )

a
s
Bellman Equation

u ( s )  r ( s )   max
a
 T ( s , a , s ) u ( s )
s

Where
r(s) represents immediate reward
T(s a s’)u(s’)
T(s,a,s )u(s ) represents
future, discounted rewards
Brute Force Solution
• Write n Bellman equations one for each n
states, solve …
• This is a non
non‐linear
linear equation due to max
a
Value Iteration Solution
• Set values of u(s) to random numbers
• Use Bellman update equation
 r ( s)   max  T ( s, a, s)u t ( s)
u t 1 ( s) 
a
s

• Converge and stop using this equation when

 (1   )
u 

where u max utility change
Value Iteration Algorithm
VALUE  ITERATION (T , r ,  ,  )
d
do
u  u'
 
for sS

  r ( s )   max
do u  ( s ) 
a
 T ( s , a , s ) u ( s )
s

if | u ' ( s )  u ( s ) | 

then
h  | u '(s)  u (s) |

 (1   )
until  

return u
  0.5 and  0.15. The algorithm stops after t=4
MDP for one agent
• Multiagent: one agent changes , others are
stationary.

• Better approach  T ( s, a, s' )

a is a vector of size ‘n’ showing each agent’s
action.
i Where
Wh ‘n’
‘ ’ represents number
b off
agents
• Rewards:
– Dole out equally among agents
– Reward proportional to contribution
Observation model
• noise + cannot observe world …

• Belief state b P1, P2, P3,...,Pn 

• Observation model O(s,o) = probability of

observing ‘o’, being in state ‘s’.
s b( s)  O( s, o) T ( s, a, s)b( s)
s
Where  is normalization constant
Partially observable MDP

T ( b , a , b )
 O(s, o) T (s, a, s)b(s)
s s
if * h
holds
ld

=
0 otherwise
 
* ‐ s b  ( s  )   O ( s  , o )  T ( s , a , s  ) b ( s ) is true for b , a , b
s

new reward function

 (b )   b ( s ) r ( s )
s
• Solving POMDP is hard.

RL - 03 Markov Decision Processes and Dynamic Programming
No ratings yet
RL - 03 Markov Decision Processes and Dynamic Programming
50 pages
AI Decision Making & RL Guide
No ratings yet
AI Decision Making & RL Guide
18 pages
AI sp12 Final Solutions
No ratings yet
AI sp12 Final Solutions
19 pages
A17 Complexdecisions
No ratings yet
A17 Complexdecisions
28 pages
MDP Solution Methods: Iteration & LP
No ratings yet
MDP Solution Methods: Iteration & LP
34 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
169 pages
AI Notes
No ratings yet
AI Notes
37 pages
15-381 Spring 2007 Final Exam SOLUTIONS
No ratings yet
15-381 Spring 2007 Final Exam SOLUTIONS
18 pages
Cs 188 HW Solutions Artificial Intelligence
No ratings yet
Cs 188 HW Solutions Artificial Intelligence
7 pages
RL Theory Tutorial
No ratings yet
RL Theory Tutorial
80 pages
Markov Decision Processes: Ppts by Dan Klein and Pieter Abbeel For Cs188 Intro To Ai at Uc Berkeley
No ratings yet
Markov Decision Processes: Ppts by Dan Klein and Pieter Abbeel For Cs188 Intro To Ai at Uc Berkeley
123 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
Markov Decision Processes: Stochastic, Sequential Environments
No ratings yet
Markov Decision Processes: Stochastic, Sequential Environments
20 pages
Lec 12
No ratings yet
Lec 12
60 pages
RL - 04 Model-Free Prediction and Control
No ratings yet
RL - 04 Model-Free Prediction and Control
59 pages
08 MDPs
No ratings yet
08 MDPs
110 pages
l1 Mdps Exact Methods
No ratings yet
l1 Mdps Exact Methods
69 pages
Reinforcement Learning: Foundations Exam
No ratings yet
Reinforcement Learning: Foundations Exam
42 pages
ECE 493, Spring 2020, Assignment 1 Due: Friday June 19, 11:59pm
No ratings yet
ECE 493, Spring 2020, Assignment 1 Due: Friday June 19, 11:59pm
3 pages
CS415 - Lecture 21 - MDPs I
No ratings yet
CS415 - Lecture 21 - MDPs I
49 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
Sp14 Cs188 Lecture 8 - Mdps I
No ratings yet
Sp14 Cs188 Lecture 8 - Mdps I
50 pages
Handling Uncertainty 03 - Solving MDP
No ratings yet
Handling Uncertainty 03 - Solving MDP
11 pages
Lec 08
No ratings yet
Lec 08
59 pages
Lecture7 MDPs I
No ratings yet
Lecture7 MDPs I
9 pages
Problem 1: Markov Reward Process
No ratings yet
Problem 1: Markov Reward Process
3 pages
cs188 sp16 mt1 Sol
No ratings yet
cs188 sp16 mt1 Sol
23 pages
5.4-Reinforcement Learning-Part1-Introduction
No ratings yet
5.4-Reinforcement Learning-Part1-Introduction
15 pages
AI Exam Prep for CS Students
No ratings yet
AI Exam Prep for CS Students
4 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Lec 25
No ratings yet
Lec 25
20 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Tut21 RL
No ratings yet
Tut21 RL
101 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
Note 3
No ratings yet
Note 3
9 pages
Reinforcement Learning - Unit 6 - Week 4
0% (1)
Reinforcement Learning - Unit 6 - Week 4
3 pages
L12 Markov Decision Processes
No ratings yet
L12 Markov Decision Processes
64 pages
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
No ratings yet
RL Problem Sheet: E0 270: Machine Learning (Spring 2025)
10 pages
Cs229-Notes12 Reinforcement in Control
No ratings yet
Cs229-Notes12 Reinforcement in Control
17 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Reinforcement Learning Exercises
No ratings yet
Reinforcement Learning Exercises
10 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
51 pages
電力系統 chapter7
No ratings yet
電力系統 chapter7
48 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Macro 2017 3
No ratings yet
Macro 2017 3
24 pages
CS 747, Autumn 2020: Week 4, Lecture 1: Shivaram Kalyanakrishnan
No ratings yet
CS 747, Autumn 2020: Week 4, Lecture 1: Shivaram Kalyanakrishnan
103 pages
DRL Mid Term Solutions
No ratings yet
DRL Mid Term Solutions
25 pages
Solutions-Markov Decision Processes
No ratings yet
Solutions-Markov Decision Processes
8 pages
06 MDP
No ratings yet
06 MDP
89 pages
Notes On Non-Cooperative Game Theory Econ 8103, Spring 2009, Aldo Rustichini
No ratings yet
Notes On Non-Cooperative Game Theory Econ 8103, Spring 2009, Aldo Rustichini
30 pages
Discounted Markov Decision Processes
No ratings yet
Discounted Markov Decision Processes
26 pages
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
No ratings yet
AI 3000 / CS 5500: Reinforcement Learning Assignment 1: Problem 1: Markov Reward Process
5 pages
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
No ratings yet
Instructor (Andrew NG) :okay, Good Morning. Welcome Back. So I Hope All of You Had
14 pages
Markov Systems, Markov Decision Processes, and Dynamic Programming
No ratings yet
Markov Systems, Markov Decision Processes, and Dynamic Programming
36 pages
2025 - MDPs 1
No ratings yet
2025 - MDPs 1
62 pages
RL Cheatsheet for Researchers
No ratings yet
RL Cheatsheet for Researchers
16 pages
Cs748 s2021 Quizzes Till q4
No ratings yet
Cs748 s2021 Quizzes Till q4
4 pages
Reinforcement Learning and Control: CS229 Lecture Notes
No ratings yet
Reinforcement Learning and Control: CS229 Lecture Notes
15 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
7 pages
Lectr14 (STOCHASTIC PROCESSES)
100% (1)
Lectr14 (STOCHASTIC PROCESSES)
49 pages
Statistics - Probability - Q3 - Mod6 - Central Limit Theorem
No ratings yet
Statistics - Probability - Q3 - Mod6 - Central Limit Theorem
24 pages
Lessons
No ratings yet
Lessons
16 pages
Hypothesis Testing Lecture Guide
No ratings yet
Hypothesis Testing Lecture Guide
17 pages
Understanding Confidence Intervals
No ratings yet
Understanding Confidence Intervals
5 pages
Lecture 5
No ratings yet
Lecture 5
64 pages
Tarea Unidad IV AEF 1052 Unidad Iva
No ratings yet
Tarea Unidad IV AEF 1052 Unidad Iva
1 page
Auditing: Estimation of Errors
No ratings yet
Auditing: Estimation of Errors
5 pages
COSM Unit-5
No ratings yet
COSM Unit-5
8 pages
Statistics and Probabilty
No ratings yet
Statistics and Probabilty
2 pages
Concrete Strength Statistical Analysis
No ratings yet
Concrete Strength Statistical Analysis
6 pages
Stats MCQs: Correlation & Regression
No ratings yet
Stats MCQs: Correlation & Regression
3 pages
Fundamentals of Statistical Inference What Is The Meaning of Random Error Norbert Hirschauer Instant Download
No ratings yet
Fundamentals of Statistical Inference What Is The Meaning of Random Error Norbert Hirschauer Instant Download
86 pages
AF5364
No ratings yet
AF5364
3 pages
BUAN6359 - Unit4 Part2 Handout
No ratings yet
BUAN6359 - Unit4 Part2 Handout
18 pages
Long Test in Statistics and Probability
No ratings yet
Long Test in Statistics and Probability
2 pages
Gompertz and Logistic Growth Models
No ratings yet
Gompertz and Logistic Growth Models
8 pages
MTH 207 - Midterm
No ratings yet
MTH 207 - Midterm
8 pages
Durbin-Watson Test Guide
No ratings yet
Durbin-Watson Test Guide
11 pages
Full Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo PDF All Chapters
100% (13)
Full Introduction To Adaptive Arrays 2nd Edition Robert A. Monzingo PDF All Chapters
82 pages
Correlation Analysis in Python
100% (1)
Correlation Analysis in Python
6 pages
Tabla Binomial y Poisson
0% (1)
Tabla Binomial y Poisson
9 pages
Reference Book
No ratings yet
Reference Book
3 pages
Advanced Regression Analysis
No ratings yet
Advanced Regression Analysis
14 pages
Advanced Regression Analysis Guide
No ratings yet
Advanced Regression Analysis Guide
68 pages
JSREP - Volume 34 - Issue 163 جزء 1 - Pages 917-948
No ratings yet
JSREP - Volume 34 - Issue 163 جزء 1 - Pages 917-948
32 pages
Lecture 4 Hypothesis Testing
No ratings yet
Lecture 4 Hypothesis Testing
24 pages
Markov Trading Model 1719707206
No ratings yet
Markov Trading Model 1719707206
4 pages
Chapter 3 Sta404
No ratings yet
Chapter 3 Sta404
11 pages
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
No ratings yet
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
9 pages

Utilities and MDP: A Lesson in Multiagent System: Henry Hexmoor Siuc Siuc

Uploaded by

Utilities and MDP: A Lesson in Multiagent System: Henry Hexmoor Siuc Siuc

Uploaded by

Utilities and MDP:

A Lesson in Multiagent System

E[ui , s, a ]   T ( s, a, s ' )ui ( s ' )

here E [u i , t ,  i (t )] represents updated, new info

• Nondeterministic world: values change

 * ( s )  arg max  T ( s , a , s )u ( s )

• Converge and stop using this equation when

• Observation model O(s,o) = probability of

new reward function

You might also like