Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Problems in MDP class from mdp.py #917

Closed
@thundergolfer

Description

@thundergolfer

Description

Today I picked up the mdp.py code to assist some work with the Value Iteration algorithm, and I found what I think are deficiencies with the MDP class and the value_iteration() function, particularly with respect to how they handle terminal states.

  • MDP.actions(self, state) returns [None] if state is a terminal. This is wrong, as action needs to act as a dictionary lookup key, and None cannot act as a key. The return value should probably be [], the empty list.
  • In value_iteration(), if the state being updated is terminal (and the above problem is fixed), then the max() function is called with an empty list, which throws a ValueError. There needs to be default=0 added to the parameter list of max() to handle updating terminal states.

There is also this problem:

  • If delta is equal to 0, which can happen when gamma=1, then the following loop breaking condition never executes: delta < epsilon*(1 - gamma)/gamma. Note that gamma=1 is a valid value, so this behaviour is a bug. It can at least be fixed by adding delta is 0 or ... to the condition

Appendix:

Code setup that discovered these problems:

act_list = ['N', 'S', 'E', 'W']
terminals = [(1,1), (1,3), (3,1)]
transitions = {
    (1, 1): {
        'N': [(0.7, (1, 1)), (0.3, (1, 1))], 'S': [(0.8, (2,1)), (0.2, (1,2))],
        'E': [(0.6, (1, 2)), (0.4, (2,1))], 'W': [(0.3, (1, 1)), (0.7, (1,1))]
    },
    (1, 2): {
        'N': [(0.7, (1,2)), (0.3, (1,1))],
        'S': [(0.8, (2,2)), (0.2, (1,3))],
        'E': [(0.6, (1,3)), (0.4, (2,2))],
        'W': [(0.7, (1,1)), (0.3, (1,2))]
    },
    (1, 3): {
        'N': [(0.7, (1,3)), (0.3, (1,2))],
        'S': [(0.8, (2,3)), (0.2, (1,3))],
        'E': [(0.6, (1,3)), (0.4, (2,3))],
        'W': [(0.7, (1,2)), (0.3, (1,3))]
    },
    (2, 1): {
        'N': [(0.7, (1,1)), (0.3, (2,1))],
        'S': [(0.8, (3,1)), (0.2, (2,2))],
        'E': [(0.6, (2,2)), (0.4, (3,1))],
        'W': [(0.7, (2,1)), (0.3, (1,1))]
    },
    (2, 2): {
        'N': [(0.7, (1,2)), (0.3, (2,1))],
        'S': [(0.8, (3,2)), (0.2, (2,3))],
        'E': [(0.6, (2,3)), (0.4, (3,2))],
        'W': [(0.7, (2,1)), (0.3, (1,2))]
    },
    (2, 3): {
        'N': [(0.7, (1,3)), (0.3, (2,2))],
        'S': [(0.8, (3,3)), (0.2, (2,3))],
        'E': [(0.6, (2,3)), (0.4, (3,3))],
        'W': [(0.7, (2,2)), (0.3, (1,3))]
    },
    (3, 1): {
        'N': [(0.7, (2,1)), (0.3, (3,1))],
        'S': [(0.8, (3,1)), (0.2, (3,2))],
        'E': [(0.6, (3,2)), (0.4, (3,1))],
        'W': [(0.7, (3,1)), (0.3, (2,1))]
    },
    (3, 2): {
        'N': [(0.7, (2,2)), (0.3, (3,1))],
        'S': [(0.8, (3,2)), (0.2, (3,3))],
        'E': [(0.6, (3,3)), (0.4, (3,2))],
        'W': [(0.7, (3,1)), (0.3, (2,2))]
    },
    (3, 3): {
        'N': [(0.7, (2,3)), (0.3, (3,2))],
        'S': [(0.8, (3,3)), (0.2, (3,3))],
        'E': [(0.6, (3,3)), (0.4, (3,3))],
        'W': [(0.7, (3,2)), (0.3, (2,3))]
    }
}

rewards = {
    (1, 1): 20,
    (1, 2): -1,
    (1, 3): 5,
    (2, 1): -1,
    (2, 2): -1,
    (2, 3): -1,
    (3, 1): -20,
    (3, 2): -1,
    (3, 3): -1
}
states = list(rewards.keys())
gamma = 1
init = (1, 1)

from mdp import MDP

problem = MDP(
    init,
    act_list,
    terminals,
    transitions,
    rewards,
    states,
    gamma
)

from mdp import value_iteration

u = value_iteration(problem)

print(u)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions