MS&E 221: Stochastic Modeling
Session 7: Nonlinear Optimization, Markov Decision Processes
Lin Fan
February 20, 2019
1 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 (from Estimation Slide Deck): Queueing model
Xn+1 = [Xn + Zn+1 − 2]+
where (Zn : n ≥ 1) i.i.d. Geometric(p∗ )
From the data, estimate p∗
2 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 (from Estimation Slide Deck):
Observed data:
Z1 = 0, Z2 = 1, Z3 = 3, Z4 = 1, Z5 = 2
L(p) = (1 − p) p · (1 − p)1 p · (1 − p)3 p · (1 − p)1 p · (1 − p)2 p = (1 − p)7 p5
0
The maximizing value of p is p̂ = 5/12
Find p̂ using Matlab:
fun=@(p)(-(1-p)^7*p^5);
p0=0.1;
p_hat=fminsearch(fun,p0)
3 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 Variant (from Estimation Slide Deck): Queueing model
Xn+1 = [Xn + Zn+1 − 1]+
where (Zn : n ≥ 1) i.i.d. Poisson(λ∗ )
From the data, estimate λ∗
4 / 18
Nonlinear Optimization: Finding Maximum Likelihood
Estimates
Example 2 Variant (from Estimation Slide Deck): Xn+1 = [Xn + Zn+1 − 1]+
Observed data:
X1 = 0, X2 = 0, X3 = 3, X4 = 2, X5 = 1, X6 = 0, X7 = 0
3
λ4 λ0
λ λ
L(λ) = e−λ + e−λ · e−λ · e−λ · e−λ + e−λ
1! 4! 0! 1!
The maximizing value of λ is λ̂ = 0.8165
Find λ̂ using Matlab:
fun=@(lambda)(-(exp(-lambda)+exp(-lambda)*lambda)*exp(-lambda)*lambda^4...
*exp(-3*lambda)*(exp(-lambda)+exp(-lambda)*lambda))
lambda0=0.1;
lambda_hat=fminsearch(fun,lambda0)
5 / 18
Sequential Decision Making
Goal: Maximize reward sequentially over time
Reward is a mathematical expression of a desirable state
Decisions made in stages
Current decision affects future outcomes, and therefore future decisions
Balance high present reward vs. potentially low future rewards
6 / 18
Markov Decision Processes
S: set of states, A(x) ⊆ A: set of actions permissible at state x ∈ S
r : S × A → R+ : reward function
Xn : state of system at time n
An : S → A: action mapping at time n, with An (x) ∈ A(x)
P (Xn+1 = xn+1 |X0 = x0 , A0 = a0 , . . . , Xn = xn , An = an )
= P (Xn+1 = xn+1 |Xn = xn , An = an ) =: Pan (xn , xn+1 )
Goal: Denoting the policy Π = (An )n≥0 , solve
"∞ #
X
−αn
maximize EΠ e r(Xn , An (Xn ))
Π
n=0
where α > 0 is discount rate.
7 / 18
Applications
Robotics, Control
Rockets
Autonomous Robots
Business Decisions
Inventory management
Scheduling, controlling queues
Personalized marketing
Finance
Portfolio management (e.g. pension funds)
Option pricing
Education (edtech services)
And many others...
8 / 18
Example: Perpetual Option
Consider an option on a stock that you can exercise at any time
Let Xn be the stock price at time n
If you exercise the option (action a0 ) at time n, you get r(Xn , a0 ), otherwise
0 (action a1 )
Once you exercise option (state E), no reward from then on
S = {E, 0, 1, 2, . . . , }, A = {a0 , a1 }
Some transition matrix Pa with Pa0 (x, E) = 1
For a ∈ A, Pa (E, E) = 1 and r(E, a) = 0
9 / 18
Bellman’s Equation
Recall An : S → A, and Π = (An )n≥0
Define optimal value
∞
" #
X
? −αn
V (x) = max EΠ e r(Xn , An (Xn ))X0 = x
Π
n=0
Theorem 1 (Bellman’s Equation)
Suppose |S| < ∞ and |A| < ∞. Then, V ? satisfies for all x ∈ S
X
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)
y∈S
Further, V ? is the unique finite solution to the above fixed point equation.
10 / 18
Sketch of Proof n o
Goal: Show V ? satisfies V ? (x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V ? (y)
P
? ?
We postulate that optimal policy is given n = A for some A : S → A (with
P∞by A−αn
? ? ?
A (x) ∈ A(x)). Then, V (x) = EA [ n=0 e
? r(Xn , A (Xn ))].
From first transition analysis, we have
X
V ? (x) = r(x, A? (x)) + e−α PA? (x) (x, y)V ? (y).
y∈S
Similarly, we can show for all a ∈ A(x)
X
V ? (x) ≥ r(x, a) + e−α Pa (x, y)V ? (y)
y∈S
Intuition: Playing optimally is better than playing action a at time 0, and then
optimally onwards
11 / 18
Solution Methods
Can we compute V ? (x) for all x ∈ S?
Use Bellman’s equation
X
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) .
a∈A(x)
y∈S
Once we have V ? , the optimal strategy A? : S → A is given by
X
A? (x) = argmax r(x, a) + e−α Pa (x, y)V ? (y)
a∈A(x) y∈S
12 / 18
Approach 1: Linear Programming
Since V ? is the unique solution to
X
V ? (x) = max r(x, a) + e−α Pa (x, y)V ? (y) ,
a∈A(x)
y∈S
it is given by the linear program
X
minimize V (x)
V
x∈S
X
s.t. V (x) ≥ r(x, a) + e−α Pa (x, y)V (y)
y∈S
for all x ∈ S, a ∈ A(x)
Drawback: Computationally expensive when |S| and |A| are large!
13 / 18
Approach 2: Value Iteration
Let T : R|S| → R|S| be nthe Bellman operator o
(T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y) .
P
The value function V ? is a fixed point of T : V ? = T V ?
Theorem 2 (Value Iteration)
For any vector V0 , we have limk→∞ (T k V0 )(x) = V ? (x) for all x ∈ S.
Starting with some |S|-dimensional vector V0 , iteratively apply Vk+1 = T Vk !
14 / 18
Example: Perpetual Option
Let Pa0 (x, E) = 1 for x ∈ S and
1 2 3 E
1 1/2 1/2 0 0
2 1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).
15 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and
1 2 3 E
1 1/2 1/2 0 0
2 1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).
1 1 1
(T V )(1) = max 0, V (1) + V (2)
2 2 2
1 1
= max 0, V (1) + V (2)
4 4
16 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and
1 2 3 E
1 1/2 1/2 0 0
2 1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).
1 1 2
(T V )(2) = max 1, V (1) + V (3)
2 3 3
1 1
= max 1, V (1) + V (3)
6 3
17 / 18
Example: Perpetual Option
n o
Compute (T V )(x) = maxa∈A(x) r(x, a) + e−α y∈S Pa (x, y)V (y)
P
Suppose that e−α = 1/2. Let Pa0 (x, E) = 1 for x ∈ S and
1 2 3 E
1 1/2 1/2 0 0
2
1/3 0 2/3 0
Pa1 = .
3 1/3 1/3 1/3 0
E 0 0 0 1
Let r(1, a0 ) = 0, r(2, a0 ) = 1, r(3, a0 ) = 2, r(x, a1 ) = 0 for all x ∈ S,
and r(E, a0 ) = r(E, a1 ) = 0 (so V (E) = 0).
1 1 1 1
(T V )(3) = max 2, V (1) + V (2) + V (3)
2 3 3 3
1 1 1
= max 2, V (1) + V (2) + V (3)
6 6 6
18 / 18