Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit e2645fb

Browse files
Added Example and Applet for Value Iteration
1 parent edac048 commit e2645fb

File tree

1 file changed

+179
-11
lines changed

1 file changed

+179
-11
lines changed

mdp.ipynb

Lines changed: 179 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
},
1212
{
1313
"cell_type": "code",
14-
"execution_count": 1,
14+
"execution_count": 172,
1515
"metadata": {
1616
"collapsed": true
1717
},
@@ -50,7 +50,7 @@
5050
},
5151
{
5252
"cell_type": "code",
53-
"execution_count": 2,
53+
"execution_count": 173,
5454
"metadata": {
5555
"collapsed": false
5656
},
@@ -87,7 +87,7 @@
8787
},
8888
{
8989
"cell_type": "code",
90-
"execution_count": 3,
90+
"execution_count": 174,
9191
"metadata": {
9292
"collapsed": true
9393
},
@@ -119,7 +119,7 @@
119119
},
120120
{
121121
"cell_type": "code",
122-
"execution_count": 4,
122+
"execution_count": 175,
123123
"metadata": {
124124
"collapsed": false
125125
},
@@ -153,7 +153,7 @@
153153
},
154154
{
155155
"cell_type": "code",
156-
"execution_count": 5,
156+
"execution_count": 176,
157157
"metadata": {
158158
"collapsed": false
159159
},
@@ -181,7 +181,7 @@
181181
},
182182
{
183183
"cell_type": "code",
184-
"execution_count": 6,
184+
"execution_count": 177,
185185
"metadata": {
186186
"collapsed": true
187187
},
@@ -221,18 +221,18 @@
221221
},
222222
{
223223
"cell_type": "code",
224-
"execution_count": 7,
224+
"execution_count": 178,
225225
"metadata": {
226226
"collapsed": false
227227
},
228228
"outputs": [
229229
{
230230
"data": {
231231
"text/plain": [
232-
"<mdp.GridMDP at 0x7fcb2826ba58>"
232+
"<mdp.GridMDP at 0x7fbecc40ebe0>"
233233
]
234234
},
235-
"execution_count": 7,
235+
"execution_count": 178,
236236
"metadata": {},
237237
"output_type": "execute_result"
238238
}
@@ -241,14 +241,182 @@
241241
"sequential_decision_environment"
242242
]
243243
},
244+
{
245+
"cell_type": "markdown",
246+
"metadata": {
247+
"collapsed": true
248+
},
249+
"source": [
250+
"# Value Iteration\n",
251+
"\n",
252+
"Now that we have looked how to represent MDPs. Let's aim at solving them. Our ultimate goal is to obtain an optimal policy. We start with looking at Value Iteration and a visualisation that should help us understanding it better.\n",
253+
"\n",
254+
"We start by calculating Value/Utility for each of the states. The Value of each state is the expected sum of discounted future rewards given we start in that state and follow a particular policy pi.The algorithm Value Iteration (**Fig. 17.4** in the book) relies on finding solutions of the Bellman's Equation. The intuition Value Iteration works is because values propagate. This point will we more clear after we encounter the visualisation. For more information you can refer to **Section 17.2** of the book. \n"
255+
]
256+
},
257+
{
258+
"cell_type": "code",
259+
"execution_count": 179,
260+
"metadata": {
261+
"collapsed": false
262+
},
263+
"outputs": [],
264+
"source": [
265+
"%psource value_iteration"
266+
]
267+
},
268+
{
269+
"cell_type": "markdown",
270+
"metadata": {},
271+
"source": [
272+
"It takes as inputs two parameters an MDP to solve and epsilon the maximum error allowed in the utility of any state. It returns a dictionary containing utilities where the keys are the states and values represent utilities. Let us solve the **sequencial_decision_enviornment** GridMDP.\n"
273+
]
274+
},
244275
{
245276
"cell_type": "code",
246-
"execution_count": null,
277+
"execution_count": 180,
278+
"metadata": {
279+
"collapsed": false
280+
},
281+
"outputs": [
282+
{
283+
"data": {
284+
"text/plain": [
285+
"{(0, 0): 0.2962883154554812,\n",
286+
" (0, 1): 0.3984432178350045,\n",
287+
" (0, 2): 0.5093943765842497,\n",
288+
" (1, 0): 0.25386699846479516,\n",
289+
" (1, 2): 0.649585681261095,\n",
290+
" (2, 0): 0.3447542300124158,\n",
291+
" (2, 1): 0.48644001739269643,\n",
292+
" (2, 2): 0.7953620878466678,\n",
293+
" (3, 0): 0.12987274656746342,\n",
294+
" (3, 1): -1.0,\n",
295+
" (3, 2): 1.0}"
296+
]
297+
},
298+
"execution_count": 180,
299+
"metadata": {},
300+
"output_type": "execute_result"
301+
}
302+
],
303+
"source": [
304+
"value_iteration(sequential_decision_environment)"
305+
]
306+
},
307+
{
308+
"cell_type": "markdown",
309+
"metadata": {},
310+
"source": [
311+
"To illustrate that values propagate out of states let us create a simple visualisation. We will be using a modified version of the value_iteration function which will store U over time. We will also remove the parameter epsilon and instead add the number of iterations we want."
312+
]
313+
},
314+
{
315+
"cell_type": "code",
316+
"execution_count": 181,
247317
"metadata": {
248318
"collapsed": true
249319
},
250320
"outputs": [],
251-
"source": []
321+
"source": [
322+
"def value_iteration_instru(mdp, iterations=20):\n",
323+
" U_over_time = []\n",
324+
" U1 = {s: 0 for s in mdp.states}\n",
325+
" R, T, gamma = mdp.R, mdp.T, mdp.gamma\n",
326+
" for _ in range(iterations):\n",
327+
" U = U1.copy()\n",
328+
" for s in mdp.states:\n",
329+
" U1[s] = R(s) + gamma * max([sum([p * U[s1] for (p, s1) in T(s, a)])\n",
330+
" for a in mdp.actions(s)])\n",
331+
" U_over_time.append(U)\n",
332+
" return U_over_time"
333+
]
334+
},
335+
{
336+
"cell_type": "markdown",
337+
"metadata": {},
338+
"source": [
339+
"Next, we define a function to create the visualisation from the utilities returned by **value_iteration_instru**. The reader need not concern himself with the code that immediately follows as it is the usage of Matplotib with IPython Widgets. If you are interested in reading more about these visit [ipywidgets.readthedocs.io](http://ipywidgets.readthedocs.io)"
340+
]
341+
},
342+
{
343+
"cell_type": "code",
344+
"execution_count": 182,
345+
"metadata": {
346+
"collapsed": true
347+
},
348+
"outputs": [],
349+
"source": [
350+
"columns = 4\n",
351+
"rows = 3\n",
352+
"U_over_time = value_iteration_instru(sequential_decision_environment)\n",
353+
" "
354+
]
355+
},
356+
{
357+
"cell_type": "code",
358+
"execution_count": 183,
359+
"metadata": {
360+
"collapsed": false
361+
},
362+
"outputs": [],
363+
"source": [
364+
"%matplotlib inline\n",
365+
"import matplotlib.pyplot as plt\n",
366+
"\n",
367+
"def plot_grid(iteration):\n",
368+
" data = U_over_time[iteration]\n",
369+
" grid = []\n",
370+
" for row in range(rows):\n",
371+
" current_row = []\n",
372+
" for column in range(columns):\n",
373+
" try:\n",
374+
" current_row.append(data[(column, row)])\n",
375+
" except KeyError:\n",
376+
" current_row.append(0)\n",
377+
" grid.append(current_row)\n",
378+
" grid.reverse() # output like book\n",
379+
" fig = plt.matshow(grid, cmap=plt.cm.bwr);\n",
380+
" plt.axis('off')\n",
381+
" fig.axes.get_xaxis().set_visible(False)\n",
382+
" fig.axes.get_yaxis().set_visible(False) "
383+
]
384+
},
385+
{
386+
"cell_type": "code",
387+
"execution_count": 184,
388+
"metadata": {
389+
"collapsed": false,
390+
"scrolled": true
391+
},
392+
"outputs": [
393+
{
394+
"data": {
395+
"image/png": "iVBORw0KGgoAAAANSUhEUgAAATgAAADtCAYAAAAr+2lCAAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAAzZJREFUeJzt2rENwzAMAEExyP4r0wsE6Qwbj7uSalg9WGh29wAUfZ5e\nAOAuAgdkCRyQJXBAlsABWQIHZH3/Pc4cf0iA19s982vuggOyBA7IEjggS+CALIEDsgQOyBI4IEvg\ngCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOy\nBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4\nIEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAs\ngQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQO\nyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL\n4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIED\nsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgS\nOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CA\nLIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IE\nDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjgg\nS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyB\nA7IEDsgSOCBL4IAsgQOyBA7IEjggS+CALIEDsgQOyBI4IEvggCyBA7IEDsgSOCBL4IAsgQOyBA7I\nEjggS+CALIEDsgQOyJrdfXoHgFu44IAsgQOyBA7IEjggS+CALIEDsi6WyArVfE1QKgAAAABJRU5E\nrkJggg==\n",
396+
"text/plain": [
397+
"<matplotlib.figure.Figure at 0x7fbea96037f0>"
398+
]
399+
},
400+
"metadata": {},
401+
"output_type": "display_data"
402+
}
403+
],
404+
"source": [
405+
"import ipywidgets as widgets\n",
406+
"from IPython.display import display\n",
407+
"\n",
408+
"iteration_slider = widgets.IntSlider(min=0, max=15, step=1, value=0)\n",
409+
"w=widgets.interactive(plot_grid,iteration=iteration_slider)\n",
410+
"display(w)\n",
411+
" "
412+
]
413+
},
414+
{
415+
"cell_type": "markdown",
416+
"metadata": {},
417+
"source": [
418+
"Move the slider above to observe how the utility changes across iterations."
419+
]
252420
}
253421
],
254422
"metadata": {

0 commit comments

Comments
 (0)