CS 5/7320
Artificial Intelligence
Making Simple Decisions
AIMA Chapter 16
Introduction slides by Michael Hahsler
Decision network slides by Dan Klein and
Pieter Abbeel
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License.
What is a simple decision?
• The environment most likely is stochastic with non-deterministic actions. It may also only be partially
observable. Otherwise, making a decision would be trivial.
• We make the same decision frequently + making it once does not affect future decisions. This means we
have an episodic environment.
• Decision theory formalizes making optimal simple decisions under uncertainty.
Decision theory =
Probability theory (evidence & belief)
+
Utility theory (want)
Decision-theoretic Agents (=Utility-based Agent)
Cannot deal with:
Logical agents • Uncertainty
• Conflicting goals
Goal-based agents • Can only assign goal/not goal to states and find goal states.
• Assign a utility value to each state.
Utility-based agents • A rational agent optimizes the expected utility (i.e., is utility-based).
• Utility is related to the external performance measure (see PEAS).
Utility
• A utility function 𝑈 𝑠 expresses the desirability
of being in state 𝑠.
• Utility functions are derived from preferences:
𝑈 𝐴 > 𝑈 𝐵 ֞𝐴 ≻ 𝐵
and
𝑈 𝐴 = 𝑈 𝐵 ֞𝐴 ~ 𝐵
• It is often enough to know a ordinal utility
function representing a ranking of states to make
decisions.
Expected Utility of an
Action Under Uncertainty
We need:
• A cardinal utility function 𝑈 𝑠 where the number
represents levels of absolute satisfaction.
• The probability 𝑃 𝑠 , that the current state is 𝑠.
• Transition probabilities 𝑃(𝑠 ′ |𝑠, 𝑎).
The probability that action 𝑎 will get us to state s’
𝑃 𝑅𝑒𝑠𝑢𝑙𝑡 𝑎 = 𝑠 ′ = 𝑃 𝑠 𝑃(𝑠 ′ |𝑠, 𝑎)
𝑠
The expected utility of action 𝑎 over all possible states is
𝐸𝑈 𝑎 = 𝑃 𝑅𝑒𝑠𝑢𝑙𝑡 𝑎 = 𝑠 ′ 𝑈(𝑠 ′ )
𝑠′
Principle of Maximum
Expected Utility (MEU)
Action
Given the expected utility of an action
𝐸𝑈 𝑎 = 𝑃 𝑅𝑒𝑠𝑢𝑙𝑡 𝑎 = 𝑠 ′ 𝑈(𝑠 ′ )
𝑠′
choose action that maximizes the expected utility:
𝑎∗ = argmax𝑎 𝐸𝑈(𝑎)
Issues:
• 𝑃 𝑅𝑒𝑠𝑢𝑙𝑡 𝑎 = 𝑠 ′ = σ𝑠 𝑃 𝑠 𝑃(𝑠 ′ |𝑠, 𝑎) may be a very large
table.
• 𝑈 𝑠 may be hard to estimate. It may depend on what states
we can get to from s.
Decision Networks
Using Bayes Nets to calculate the
Expected Utility of Actions.
These slides were created by Dan Klein, Pieter Abbeel, Sergey Levine,
with some materials from A. Farhadi. All CS188 materials are at
http://ai.berkeley.edu
Decision Networks Utility
Action
Umbrella
Weather
Random
Events
Forecast
Decision Networks
Decision networks
Umbrella
▪ Bayes nets with additional nodes for
utility and actions.
▪ Allows to specify the joint probability in a U
compact way using independence.
▪ Calculate the expected utility for each
possible action and choose the best.
Weather
Node types
Chance nodes: Random variables in BNs
Action nodes: Cannot have parents, act
as observed evidence
Forecast
Utility node: Depends on action and
chance nodes
Decision Network without Forecast
Action: Umbrella = leave
Umbrella (A)
U
Action: Umbrella = take
Weather
𝑊 𝑃(𝑊) 𝐴 𝑊 𝑈(𝐴, 𝑊)
sun 0.7 leave sun 100
rain 0.3 leave rain 0
Optimal decision 𝑎 ∗ = leave take sun 20
take rain 70
Decisions as Outcome Trees {} … no evidence
available
{}
Action
Umbrella
U Weather | {} Random Weather | {}
Event
Weather
U(t,s) U(t,r) U(l,s) U(l,r)
Decision Network with Bad Forecast
Action: Umbrella = leave
Umbrella (A)
𝐴 𝑊 𝑈(𝐴, 𝑊)
leave sun 100
U
leave rain 0
Action: Umbrella = take take sun 20
Weather take rain 70
𝑊 𝑃(𝑊) 𝑊 𝑃 𝑊 𝐹 = 𝑏𝑎𝑑)
sun 0.7 sun 0.34
rain 0.3 rain 0.66
Forecast
Optimal decision = take
=bad
A bad forecast
increases the
probability of rain!
Decisions as Outcome Trees {b} … evidence is bad
weather forecast increases
the probability of rain
Umbrella
{b}
U
W | {b} W | {b}
Weather
U(t,s) U(t,r) U(l,s) U(l,r)
Forecast
=bad
Conclusion
Decision networks are an Sequential decision-
extension of Bayes nets making deals with
that add actions and Decision networks can be decisions that influence
utility to compactly used to make simple each other and are made
specify the joint repeated decisions in a over time. This is a more
probability. stochastic, partially complex decision
observable, and episodic problem and needs
The network is used to environment. different methods like
calculate the expected Markov Decision
utility of actions. Processes.