0% found this document useful (0 votes)

121 views18 pages

Stochastic Models for Engineers

This document discusses nonlinear optimization and Markov decision processes (MDPs). It provides two examples of using maximum likelihood estimation to estimate parameters for queueing models based on observed data. It then introduces MDPs and Bellman's equation for solving sequential decision problems to maximize reward over time. Two solution methods are described: linear programming and value iteration. An example perpetual option problem is presented to illustrate an MDP.

Uploaded by

l f

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

121 views18 pages

Stochastic Models for Engineers

Uploaded by

l f

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

MS&E 221: Stochastic Modeling

Session 7: Nonlinear Optimization, Markov Decision Processes

Lin Fan

February 20, 2019

1 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 (from Estimation Slide Deck): Queueing model

Xn+1 = [Xn + Zn+1 − 2]+

where (Zn : n ≥ 1) i.i.d. Geometric(p∗ )

From the data, estimate p∗

2 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 (from Estimation Slide Deck):

Observed data:
Z1 = 0, Z2 = 1, Z3 = 3, Z4 = 1, Z5 = 2
L(p) = (1 − p) p · (1 − p)1 p · (1 − p)3 p · (1 − p)1 p · (1 − p)2 p = (1 − p)7 p5
0

The maximizing value of p is p̂ = 5/12

Find p̂ using Matlab:

fun=@(p)(-(1-p)^7*p^5);
p0=0.1;
p_hat=fminsearch(fun,p0)

3 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates

Example 2 Variant (from Estimation Slide Deck): Queueing model

Xn+1 = [Xn + Zn+1 − 1]+

where (Zn : n ≥ 1) i.i.d. Poisson(λ∗ )

From the data, estimate λ∗

4 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 Variant (from Estimation Slide Deck): Xn+1 = [Xn + Zn+1 − 1]+

Observed data:

X1 = 0, X2 = 0, X3 = 3, X4 = 2, X5 = 1, X6 = 0, X7 = 0
3
λ4 λ0

λ λ
L(λ) = e−λ + e−λ · e−λ · e−λ · e−λ + e−λ
1! 4! 0! 1!
The maximizing value of λ is λ̂ = 0.8165

Find λ̂ using Matlab:

fun=@(lambda)(-(exp(-lambda)+exp(-lambda)*lambda)*exp(-lambda)*lambda^4...
*exp(-3*lambda)*(exp(-lambda)+exp(-lambda)*lambda))
lambda0=0.1;
lambda_hat=fminsearch(fun,lambda0)

5 / 18
Sequential Decision Making

Goal: Maximize reward sequentially over time

Reward is a mathematical expression of a desirable state

Decisions made in stages
Current decision affects future outcomes, and therefore future decisions
Balance high present reward vs. potentially low future rewards

6 / 18
Markov Decision Processes

S: set of states, A(x) ⊆ A: set of actions permissible at state x ∈ S

r : S × A → R+ : reward function
Xn : state of system at time n
An : S → A: action mapping at time n, with An (x) ∈ A(x)

P (Xn+1 = xn+1 |X0 = x0 , A0 = a0 , . . . , Xn = xn , An = an )

= P (Xn+1 = xn+1 |Xn = xn , An = an ) =: Pan (xn , xn+1 )

Goal: Denoting the policy Π = (An )n≥0 , solve

"∞ #
X
−αn
maximize EΠ e r(Xn , An (Xn ))
Π
n=0

where α > 0 is discount rate.

7 / 18
Applications

Robotics, Control
Rockets
Autonomous Robots
Business Decisions
Inventory management
Scheduling, controlling queues
Personalized marketing
Finance
Portfolio management (e.g. pension funds)
Option pricing
Education (edtech services)
And many others...

8 / 18
Example: Perpetual Option

Consider an option on a stock that you can exercise at any time

Let Xn be the stock price at time n
If you exercise the option (action a0 ) at time n, you get r(Xn , a0 ), otherwise
0 (action a1 )
Once you exercise option (state E), no reward from then on
S = {E, 0, 1, 2, . . . , }, A = {a0 , a1 }
Some transition matrix Pa with Pa0 (x, E) = 1
For a ∈ A, Pa (E, E) = 1 and r(E, a) = 0

9 / 18
Bellman’s Equation

Recall An : S → A, and Π = (An )n≥0

Define optimal value

∞
" #
X
? −αn
V (x) = max EΠ e r(Xn , An (Xn ))X0 = x

Π
n=0

Theorem 1 (Bellman’s Equation)

Suppose |S| < ∞ and |A| < ∞. Then, V ? satisfies for all x ∈ S
 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)  
y∈S

Further, V ? is the unique finite solution to the above fixed point equation.

10 / 18
Sketch of Proof n o
Goal: Show V ? satisfies V ? (x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V ? (y)
P

? ?
We postulate that optimal policy is given n = A for some A : S → A (with
P∞by A−αn
? ? ?
A (x) ∈ A(x)). Then, V (x) = EA [ n=0 e
? r(Xn , A (Xn ))].
From first transition analysis, we have
X
V ? (x) = r(x, A? (x)) + e−α PA? (x) (x, y)V ? (y).
y∈S

Similarly, we can show for all a ∈ A(x)

X
V ? (x) ≥ r(x, a) + e−α Pa (x, y)V ? (y)
y∈S

Intuition: Playing optimally is better than playing action a at time 0, and then
optimally onwards

11 / 18
Solution Methods

Can we compute V ? (x) for all x ∈ S?

Use Bellman’s equation
 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)  
y∈S

Once we have V ? , the optimal strategy A? : S → A is given by

 
 X 
A? (x) = argmax r(x, a) + e−α Pa (x, y)V ? (y)
a∈A(x)  y∈S


12 / 18
Approach 1: Linear Programming

Since V ? is the unique solution to

 
 X 
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) ,
a∈A(x)  
y∈S

it is given by the linear program

X
minimize V (x)
V
x∈S
X
s.t. V (x) ≥ r(x, a) + e−α Pa (x, y)V (y)
y∈S

for all x ∈ S, a ∈ A(x)

Drawback: Computationally expensive when |S| and |A| are large!

13 / 18
Approach 2: Value Iteration

Let T : R|S| → R|S| be nthe Bellman operator o

(T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y) .
P

The value function V ? is a fixed point of T : V ? = T V ?

Theorem 2 (Value Iteration)

For any vector V0 , we have limk→∞ (T k V0 )(x) = V ? (x) for all x ∈ S.

Starting with some |S|-dimensional vector V0 , iteratively apply Vk+1 = T Vk !

14 / 18
Example: Perpetual Option

Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1

Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,

and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

15 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 1
(T V )(1) = max 0, V (1) + V (2)
2 2 2

1 1
= max 0, V (1) + V (2)
4 4

16 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2  1/3 0 2/3 0 
Pa1 =  .
3  1/3 1/3 1/3 0 
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 2
(T V )(2) = max 1, V (1) + V (3)
2 3 3

1 1
= max 1, V (1) + V (3)
6 3

17 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P

Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and

1 2 3 E
 
1 1/2 1/2 0 0
2 
 1/3 0 2/3 0 
Pa1 = .
3  1/3 1/3 1/3 0 
E 0 0 0 1

Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,

and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).

1 1 1 1
(T V )(3) = max 2, V (1) + V (2) + V (3)
2 3 3 3

1 1 1
= max 2, V (1) + V (2) + V (3)
6 6 6

18 / 18

Reinforcement Learning Cheat Sheet: Return
No ratings yet
Reinforcement Learning Cheat Sheet: Return
7 pages
Class X Maths Practice Paper 2023-24 (DPS, GZD)
100% (5)
Class X Maths Practice Paper 2023-24 (DPS, GZD)
5 pages
IB Higher Level Maths Applications Interpretation Sample
No ratings yet
IB Higher Level Maths Applications Interpretation Sample
22 pages
CS 188 Fall 2018 Written HW4 Soln
No ratings yet
CS 188 Fall 2018 Written HW4 Soln
6 pages
Computational Economics: Session 16: Numerical Dynamic Programming
No ratings yet
Computational Economics: Session 16: Numerical Dynamic Programming
17 pages
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
No ratings yet
Dynamic Programming: Quantitative Macroeconomics (Econ 5725)
55 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
51 pages
Green Function Notes
No ratings yet
Green Function Notes
3 pages
Fa19 Lecture 15 MDPs II
No ratings yet
Fa19 Lecture 15 MDPs II
76 pages
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
No ratings yet
Infinite-Horizon Dynamic Programming: Tianxiao Zheng Saif
10 pages
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
No ratings yet
Homework 1: ELEN E6885: Introduction To Reinforcement Learning September 21, 2021
8 pages
Data Structures MCQ
100% (1)
Data Structures MCQ
19 pages
Sp14 Cs188 Lecture 9 - Mdps II
No ratings yet
Sp14 Cs188 Lecture 9 - Mdps II
48 pages
Optimal Control Theory
No ratings yet
Optimal Control Theory
28 pages
Markov Decision Process II
No ratings yet
Markov Decision Process II
88 pages
Taxicab Powerpoint
No ratings yet
Taxicab Powerpoint
44 pages
Formula Sheet: Section 1 - Deterministic Dynamic Programming
No ratings yet
Formula Sheet: Section 1 - Deterministic Dynamic Programming
10 pages
Bellemare17a PDF
No ratings yet
Bellemare17a PDF
10 pages
AI Exam Prep for CS Students
No ratings yet
AI Exam Prep for CS Students
4 pages
Lecture 3 - MDPs and Dynamic Programming
No ratings yet
Lecture 3 - MDPs and Dynamic Programming
66 pages
Compartment Models
100% (1)
Compartment Models
47 pages
MDP Solution Methods: Iteration & LP
No ratings yet
MDP Solution Methods: Iteration & LP
34 pages
Lec 09
No ratings yet
Lec 09
51 pages
New CZ3005 Module 4 - Markov Decision Process
No ratings yet
New CZ3005 Module 4 - Markov Decision Process
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
101 pages
RL Dynamic Programming Lecture
No ratings yet
RL Dynamic Programming Lecture
43 pages
EE675 Lecture 10
No ratings yet
EE675 Lecture 10
4 pages
EE675A Lec12
No ratings yet
EE675A Lec12
5 pages
ML Unit 4
No ratings yet
ML Unit 4
9 pages
Stability of Linear Systems: 11.1 Some Definitions
No ratings yet
Stability of Linear Systems: 11.1 Some Definitions
8 pages
Econ & Finance Math Solutions
No ratings yet
Econ & Finance Math Solutions
4 pages
Equations Reducible To Quadratic Equations: Book 4B Chapter 8
No ratings yet
Equations Reducible To Quadratic Equations: Book 4B Chapter 8
28 pages
cs229 Notes13
No ratings yet
cs229 Notes13
15 pages
242 Sheet 02 03
No ratings yet
242 Sheet 02 03
5 pages
Module-2 For Btech in Topic
No ratings yet
Module-2 For Btech in Topic
29 pages
MTH 581 582 Introduction To Abstract Alg
No ratings yet
MTH 581 582 Introduction To Abstract Alg
276 pages
CalcAnswersCh7 Nswers To Exercises For Chapter 7 Logarithmic and Exponential Functions PDF
No ratings yet
CalcAnswersCh7 Nswers To Exercises For Chapter 7 Logarithmic and Exponential Functions PDF
7 pages
Lec 3
No ratings yet
Lec 3
15 pages
Dynamic Programming for Economists
No ratings yet
Dynamic Programming for Economists
71 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
Recall: Perfect Squares Are Numbers or Expressions That Can Be Expressed
No ratings yet
Recall: Perfect Squares Are Numbers or Expressions That Can Be Expressed
5 pages
TFNs Triangular Fuzzy Numbers
No ratings yet
TFNs Triangular Fuzzy Numbers
9 pages
Lecture26 Ri
No ratings yet
Lecture26 Ri
55 pages
Mock Test Paper-3: X X X X
No ratings yet
Mock Test Paper-3: X X X X
5 pages
History & Proof of Quadratic Formula
No ratings yet
History & Proof of Quadratic Formula
7 pages
l1 Mdps Exact Methods
No ratings yet
l1 Mdps Exact Methods
69 pages
Determining Quadratic Equations
No ratings yet
Determining Quadratic Equations
15 pages
2 Dynamic
No ratings yet
2 Dynamic
50 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Reinforcement Learning Essentials
No ratings yet
Reinforcement Learning Essentials
21 pages
A Distrib Persp On RL
No ratings yet
A Distrib Persp On RL
19 pages
Exact (RL IITH)
No ratings yet
Exact (RL IITH)
47 pages
Cadet College Choa Saiden Shah Chakwal: Selection Examination - 3 Entry - 2013
No ratings yet
Cadet College Choa Saiden Shah Chakwal: Selection Examination - 3 Entry - 2013
3 pages
Bapat Simple PDF
No ratings yet
Bapat Simple PDF
5 pages
Chap 08 Real Analysis: Improper Integrals.
100% (8)
Chap 08 Real Analysis: Improper Integrals.
20 pages
NM Practical File
No ratings yet
NM Practical File
12 pages
Math Olympiad Mentor Solutions
No ratings yet
Math Olympiad Mentor Solutions
6 pages
Hkust: MATH2011 Introduction To Multivariable Calculus
No ratings yet
Hkust: MATH2011 Introduction To Multivariable Calculus
6 pages
Math 10 W3 Harmonic Sequence
100% (2)
Math 10 W3 Harmonic Sequence
8 pages
Csat
No ratings yet
Csat
5 pages
Assignment 5
No ratings yet
Assignment 5
2 pages
12 ML Reinforcement Learning Value Based Control
No ratings yet
12 ML Reinforcement Learning Value Based Control
12 pages
Matrices Notes
No ratings yet
Matrices Notes
4 pages
Lecture 06
No ratings yet
Lecture 06
98 pages
MDP Cheatsheet
No ratings yet
MDP Cheatsheet
3 pages
DRL Homework 1
No ratings yet
DRL Homework 1
4 pages
02 Bellman Equations and Optimality - Complete Guide
No ratings yet
02 Bellman Equations and Optimality - Complete Guide
6 pages
Subtitle
No ratings yet
Subtitle
1 page
PDE - Notes 1
No ratings yet
PDE - Notes 1
79 pages
MDP RL Paper GPT
No ratings yet
MDP RL Paper GPT
6 pages
Anti Derivative Chain Rule
No ratings yet
Anti Derivative Chain Rule
4 pages
Subtitle
No ratings yet
Subtitle
2 pages
Artificial Intelligence: Lecture 9 - Markov Decision Processes II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 9 - Markov Decision Processes II Dr. Shivanjali Khare
44 pages
Assignment 5 Solution
No ratings yet
Assignment 5 Solution
19 pages
Lec 12
No ratings yet
Lec 12
60 pages
Subtitle
No ratings yet
Subtitle
2 pages
2025 - MDPs - Part 2
No ratings yet
2025 - MDPs - Part 2
41 pages
2025 - MDPs 2
No ratings yet
2025 - MDPs 2
42 pages
2024 MDPs Part 1
No ratings yet
2024 MDPs Part 1
59 pages
Elitmus Solved Previous Placement Paper
No ratings yet
Elitmus Solved Previous Placement Paper
13 pages
B-Math 110 - Extramid2exam-2025-T2
No ratings yet
B-Math 110 - Extramid2exam-2025-T2
82 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
402 Lec20
No ratings yet
402 Lec20
21 pages
Introduction To RL
No ratings yet
Introduction To RL
64 pages
RL - 03 Markov Decision Processes and Dynamic Programming
No ratings yet
RL - 03 Markov Decision Processes and Dynamic Programming
50 pages
Markov Decision Processes Ii: Ppts by Dan Klein and Pieter Abbeel For Cs188 Intro To Ai at Uc Berkeley
No ratings yet
Markov Decision Processes Ii: Ppts by Dan Klein and Pieter Abbeel For Cs188 Intro To Ai at Uc Berkeley
50 pages
Class Notes 2
No ratings yet
Class Notes 2
6 pages