diff --git a/README.md b/README.md index 8bac287b6..a7b5d1667 100644 --- a/README.md +++ b/README.md @@ -112,26 +112,26 @@ Here is a table of algorithms, the figure, name of the algorithm in the book and | 10.3 | Three-Block-Tower | `three_block_tower` | [`planning.py`][planning] | Done | Included | | 10.7 | Cake-Problem | `have_cake_and_eat_cake_too` | [`planning.py`][planning] | Done | Included | | 10.9 | Graphplan | `GraphPlan` | [`planning.py`][planning] | Done | Included | -| 10.13 | Partial-Order-Planner | | | | | -| 11.1 | Job-Shop-Problem-With-Resources | `job_shop_problem` | [`planning.py`][planning] | Done | | +| 10.13 | Partial-Order-Planner | `PartialOrderPlanner` | [`planning.py`][planning] | Done | Included | +| 11.1 | Job-Shop-Problem-With-Resources | `job_shop_problem` | [`planning.py`][planning] | Done | Included | | 11.5 | Hierarchical-Search | `hierarchical_search` | [`planning.py`][planning] | | | | 11.8 | Angelic-Search | | | | | -| 11.10 | Doubles-tennis | `double_tennis_problem` | [`planning.py`][planning] | | | +| 11.10 | Doubles-tennis | `double_tennis_problem` | [`planning.py`][planning] | Done | Included | | 13 | Discrete Probability Distribution | `ProbDist` | [`probability.py`][probability] | Done | Included | | 13.1 | DT-Agent | `DTAgent` | [`probability.py`][probability] | | | | 14.9 | Enumeration-Ask | `enumeration_ask` | [`probability.py`][probability] | Done | Included | | 14.11 | Elimination-Ask | `elimination_ask` | [`probability.py`][probability] | Done | Included | -| 14.13 | Prior-Sample | `prior_sample` | [`probability.py`][probability] | | Included | +| 14.13 | Prior-Sample | `prior_sample` | [`probability.py`][probability] | Done | Included | | 14.14 | Rejection-Sampling | `rejection_sampling` | [`probability.py`][probability] | Done | Included | | 14.15 | Likelihood-Weighting | `likelihood_weighting` | [`probability.py`][probability] | Done | Included | | 14.16 | Gibbs-Ask | `gibbs_ask` | [`probability.py`][probability] | Done | Included | -| 15.4 | Forward-Backward | `forward_backward` | [`probability.py`][probability] | Done | | -| 15.6 | Fixed-Lag-Smoothing | `fixed_lag_smoothing` | [`probability.py`][probability] | Done | | -| 15.17 | Particle-Filtering | `particle_filtering` | [`probability.py`][probability] | Done | | -| 16.9 | Information-Gathering-Agent | | | | | +| 15.4 | Forward-Backward | `forward_backward` | [`probability.py`][probability] | Done | Included | +| 15.6 | Fixed-Lag-Smoothing | `fixed_lag_smoothing` | [`probability.py`][probability] | Done | Included | +| 15.17 | Particle-Filtering | `particle_filtering` | [`probability.py`][probability] | Done | Included | +| 16.9 | Information-Gathering-Agent | `InformationGatheringAgent` | [`probability.py`][probability] | Done | Included | | 17.4 | Value-Iteration | `value_iteration` | [`mdp.py`][mdp] | Done | Included | | 17.7 | Policy-Iteration | `policy_iteration` | [`mdp.py`][mdp] | Done | Included | -| 17.9 | POMDP-Value-Iteration | | | | | +| 17.9 | POMDP-Value-Iteration | `pomdp_value_iteration` | [`mdp.py`][mdp] | Done | Included | | 18.5 | Decision-Tree-Learning | `DecisionTreeLearner` | [`learning.py`][learning] | Done | Included | | 18.8 | Cross-Validation | `cross_validation` | [`learning.py`][learning] | | | | 18.11 | Decision-List-Learning | `DecisionListLearner` | [`learning.py`][learning]\* | | | @@ -147,7 +147,7 @@ Here is a table of algorithms, the figure, name of the algorithm in the book and | 22.1 | HITS | `HITS` | [`nlp.py`][nlp] | Done | Included | | 23 | Chart-Parse | `Chart` | [`nlp.py`][nlp] | Done | Included | | 23.5 | CYK-Parse | `CYK_parse` | [`nlp.py`][nlp] | Done | Included | -| 25.9 | Monte-Carlo-Localization | `monte_carlo_localization` | [`probability.py`][probability] | Done | | +| 25.9 | Monte-Carlo-Localization | `monte_carlo_localization` | [`probability.py`][probability] | Done | Included | # Index of data structures diff --git a/images/pop.jpg b/images/pop.jpg new file mode 100644 index 000000000..52b3e3756 Binary files /dev/null and b/images/pop.jpg differ diff --git a/mdp.ipynb b/mdp.ipynb index aa74514e0..b9952f528 100644 --- a/mdp.ipynb +++ b/mdp.ipynb @@ -4,9 +4,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Markov decision processes (MDPs)\n", + "# Making Complex Decisions\n", + "---\n", "\n", - "This IPy notebook acts as supporting material for topics covered in **Chapter 17 Making Complex Decisions** of the book* Artificial Intelligence: A Modern Approach*. We makes use of the implementations in mdp.py module. This notebook also includes a brief summary of the main topics as a review. Let us import everything from the mdp module to get started." + "This Jupyter notebook acts as supporting material for topics covered in **Chapter 17 Making Complex Decisions** of the book* Artificial Intelligence: A Modern Approach*. We make use of the implementations in mdp.py module. This notebook also includes a brief summary of the main topics as a review. Let us import everything from the mdp module to get started." ] }, { @@ -16,7 +17,7 @@ "outputs": [], "source": [ "from mdp import *\n", - "from notebook import psource, pseudocode" + "from notebook import psource, pseudocode, plot_pomdp_utility" ] }, { @@ -30,7 +31,10 @@ "* Grid MDP\n", "* Value Iteration\n", " * Value Iteration Visualization\n", - "* Policy Iteration" + "* Policy Iteration\n", + "* POMDPs\n", + "* POMDP Value Iteration\n", + " - Value Iteration Visualization" ] }, { @@ -2170,6 +2174,769 @@ "For in-depth knowledge about sequential decision problems, refer **Section 17.1** in the AIMA book." ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## POMDP\n", + "---\n", + "Partially Observable Markov Decision Problems\n", + "\n", + "In retrospect, a Markov decision process or MDP is defined as:\n", + "- a sequential decision problem for a fully observable, stochastic environment with a Markovian transition model and additive rewards.\n", + "\n", + "An MDP consists of a set of states (with an initial state $s_0$); a set $A(s)$ of actions\n", + "in each state; a transition model $P(s' | s, a)$; and a reward function $R(s)$.\n", + "\n", + "The MDP seeks to make sequential decisions to occupy states so as to maximise some combination of the reward function $R(s)$.\n", + "\n", + "The characteristic problem of the MDP is hence to identify the optimal policy function $\\pi^*(s)$ that provides the _utility-maximising_ action $a$ to be taken when the current state is $s$.\n", + "\n", + "### Belief vector\n", + "\n", + "**Note**: The book refers to the _belief vector_ as the _belief state_. We use the latter terminology here to retain our ability to refer to the belief vector as a _probability distribution over states_.\n", + "\n", + "The solution of an MDP is subject to certain properties of the problem which are assumed and justified in [Section 17.1]. One critical assumption is that the agent is **fully aware of its current state at all times**.\n", + "\n", + "A tedious (but rewarding, as we will see) way of expressing this is in terms of the **belief vector** $b$ of the agent. The belief vector is a function mapping states to probabilities or certainties of being in those states.\n", + "\n", + "Consider an agent that is fully aware that it is in state $s_i$ in the statespace $(s_1, s_2, ... s_n)$ at the current time.\n", + "\n", + "Its belief vector is the vector $(b(s_1), b(s_2), ... b(s_n))$ given by the function $b(s)$:\n", + "\\begin{align*}\n", + "b(s) &= 0 \\quad \\text{if }s \\neq s_i \\\\ &= 1 \\quad \\text{if } s = s_i\n", + "\\end{align*}\n", + "\n", + "Note that $b(s)$ is a probability distribution that necessarily sums to $1$ over all $s$.\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### POMDPs - a conceptual outline\n", + "\n", + "The POMDP really has only two modifications to the **problem formulation** compared to the MDP.\n", + "\n", + "- **Belief state** - In the real world, the current state of an agent is often not known with complete certainty. This makes the concept of a belief vector extremely relevant. It allows the agent to represent different degrees of certainty with which it _believes_ it is in each state.\n", + "\n", + "- **Evidence percepts** - In the real world, agents often have certain kinds of evidence, collected from sensors. They can use the probability distribution of observed evidence, conditional on state, to consolidate their information. This is a known distribution $P(e\\ |\\ s)$ - $e$ being an evidence, and $s$ being the state it is conditional on.\n", + "\n", + "Consider the world we used for the MDP. \n", + "\n", + "\n", + "\n", + "#### Using the belief vector\n", + "An agent beginning at $(1, 1)$ may not be certain that it is indeed in $(1, 1)$. Consider a belief vector $b$ such that:\n", + "\\begin{align*}\n", + " b((1,1)) &= 0.8 \\\\\n", + " b((2,1)) &= 0.1 \\\\\n", + " b((1,2)) &= 0.1 \\\\\n", + " b(s) &= 0 \\quad \\quad \\forall \\text{ other } s\n", + "\\end{align*}\n", + "\n", + "By horizontally catenating each row, we can represent this as an 11-dimensional vector (omitting $(2, 2)$).\n", + "\n", + "Thus, taking $s_1 = (1, 1)$, $s_2 = (1, 2)$, ... $s_{11} = (4,3)$, we have $b$:\n", + "\n", + "$b = (0.8, 0.1, 0, 0, 0.1, 0, 0, 0, 0, 0, 0)$ \n", + "\n", + "This fully represents the certainty to which the agent is aware of its state.\n", + "\n", + "#### Using evidence\n", + "The evidence observed here could be the number of adjacent 'walls' or 'dead ends' observed by the agent. We assume that the agent cannot 'orient' the walls - only count them.\n", + "\n", + "In this case, $e$ can take only two values, 1 and 2. This gives $P(e\\ |\\ s)$ as:\n", + "\\begin{align*}\n", + " P(e=2\\ |\\ s) &= \\frac{1}{7} \\quad \\forall \\quad s \\in \\{s_1, s_2, s_4, s_5, s_8, s_9, s_{11}\\}\\\\\n", + " P(e=1\\ |\\ s) &= \\frac{1}{4} \\quad \\forall \\quad s \\in \\{s_3, s_6, s_7, s_{10}\\} \\\\\n", + " P(e\\ |\\ s) &= 0 \\quad \\forall \\quad \\text{ other } s, e\n", + "\\end{align*}\n", + "\n", + "Note that the implications of the evidence on the state must be known **a priori** to the agent. Ways of reliably learning this distribution from percepts are beyond the scope of this notebook." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### POMDPs - a rigorous outline\n", + "\n", + "A POMDP is thus a sequential decision problem for for a *partially* observable, stochastic environment with a Markovian transition model, a known 'sensor model' for inferring state from observation, and additive rewards. \n", + "\n", + "Practically, a POMDP has the following, which an MDP also has:\n", + "- a set of states, each denoted by $s$\n", + "- a set of actions available in each state, $A(s)$\n", + "- a reward accrued on attaining some state, $R(s)$\n", + "- a transition probability $P(s'\\ |\\ s, a)$ of action $a$ changing the state from $s$ to $s'$\n", + "\n", + "And the following, which an MDP does not:\n", + "- a sensor model $P(e\\ |\\ s)$ on evidence conditional on states\n", + "\n", + "Additionally, the POMDP is now uncertain of its current state hence has:\n", + "- a belief vector $b$ representing the certainty of being in each state (as a probability distribution)\n", + "\n", + "\n", + "#### New uncertainties\n", + "\n", + "It is useful to intuitively appreciate the new uncertainties that have arisen in the agent's awareness of its own state.\n", + "\n", + "- At any point, the agent has belief vector $b$, the distribution of its believed likelihood of being in each state $s$.\n", + "- For each of these states $s$ that the agent may **actually** be in, it has some set of actions given by $A(s)$.\n", + "- Each of these actions may transport it to some other state $s'$, assuming an initial state $s$, with probability $P(s'\\ |\\ s, a)$\n", + "- Once the action is performed, the agent receives a percept $e$. $P(e\\ |\\ s)$ now tells it the chances of having perceived $e$ for each state $s$. The agent must use this information to update its new belief state appropriately.\n", + "\n", + "#### Evolution of the belief vector - the `FORWARD` function\n", + "\n", + "The new belief vector $b'(s')$ after an action $a$ on the belief vector $b(s)$ and the noting of evidence $e$ is:\n", + "$$ b'(s') = \\alpha P(e\\ |\\ s') \\sum_s P(s'\\ | s, a) b(s)$$ \n", + "\n", + "where $\\alpha$ is a normalising constant (to retain the interpretation of $b$ as a probability distribution.\n", + "\n", + "This equation is just counts the sum of likelihoods of going to a state $s'$ from every possible state $s$, times the initial likelihood of being in each $s$. This is multiplied by the likelihood that the known evidence actually implies the new state $s'$. \n", + "\n", + "This function is represented as `b' = FORWARD(b, a, e)`\n", + "\n", + "#### Probability distribution of the evolving belief vector\n", + "\n", + "The goal here is to find $P(b'\\ |\\ b, a)$ - the probability that action $a$ transforms belief vector $b$ into belief vector $b'$. The following steps illustrate this -\n", + "\n", + "The probability of observing evidence $e$ when action $a$ is enacted on belief vector $b$ can be distributed over each possible new state $s'$ resulting from it:\n", + "\\begin{align*}\n", + " P(e\\ |\\ b, a) &= \\sum_{s'} P(e\\ |\\ b, a, s') P(s'\\ |\\ b, a) \\\\\n", + " &= \\sum_{s'} P(e\\ |\\ s') P(s'\\ |\\ b, a) \\\\\n", + " &= \\sum_{s'} P(e\\ |\\ s') \\sum_s P(s'\\ |\\ s, a) b(s)\n", + "\\end{align*}\n", + "\n", + "The probability of getting belief vector $b'$ from $b$ by application of action $a$ can thus be summed over all possible evidences $e$:\n", + "\\begin{align*}\n", + " P(b'\\ |\\ b, a) &= \\sum_{e} P(b'\\ |\\ b, a, e) P(e\\ |\\ b, a) \\\\\n", + " &= \\sum_{e} P(b'\\ |\\ b, a, e) \\sum_{s'} P(e\\ |\\ s') \\sum_s P(s'\\ |\\ s, a) b(s)\n", + "\\end{align*}\n", + "\n", + "where $P(b'\\ |\\ b, a, e) = 1$ if $b' = $ `FORWARD(b, a, e)` and $= 0$ otherwise.\n", + "\n", + "Given initial and final belief states $b$ and $b'$, the transition probabilities still depend on the action $a$ and observed evidence $e$. Some belief states may be achievable by certain actions, but have non-zero probabilities for states prohibited by the evidence $e$. Thus, the above condition thus ensures that only valid combinations of $(b', b, a, e)$ are considered.\n", + "\n", + "#### A modified rewardspace\n", + "\n", + "For MDPs, the reward space was simple - one reward per available state. However, for a belief vector $b(s)$, the expected reward is now:\n", + "$$\\rho(b) = \\sum_s b(s) R(s)$$\n", + "\n", + "Thus, as the belief vector can take infinite values of the distribution over states, so can the reward for each belief vector vary over a hyperplane in the belief space, or space of states (planes in an $N$-dimensional space are formed by a linear combination of the axes)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we know the basics, let's have a look at the `POMDP` class." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "\n", + "\n", + "\n", + "
\n", + "class POMDP(MDP):\n",
+ "\n",
+ " """A Partially Observable Markov Decision Process, defined by\n",
+ " a transition model P(s'|s,a), actions A(s), a reward function R(s),\n",
+ " and a sensor model P(e|s). We also keep track of a gamma value,\n",
+ " for use by algorithms. The transition and the sensor models\n",
+ " are defined as matrices. We also keep track of the possible states\n",
+ " and actions for each state. [page 659]."""\n",
+ "\n",
+ " def __init__(self, actions, transitions=None, evidences=None, rewards=None, states=None, gamma=0.95):\n",
+ " """Initialize variables of the pomdp"""\n",
+ "\n",
+ " if not (0 < gamma <= 1):\n",
+ " raise ValueError('A POMDP must have 0 < gamma <= 1')\n",
+ "\n",
+ " self.states = states\n",
+ " self.actions = actions\n",
+ "\n",
+ " # transition model cannot be undefined\n",
+ " self.t_prob = transitions or {}\n",
+ " if not self.t_prob:\n",
+ " print('Warning: Transition model is undefined')\n",
+ " \n",
+ " # sensor model cannot be undefined\n",
+ " self.e_prob = evidences or {}\n",
+ " if not self.e_prob:\n",
+ " print('Warning: Sensor model is undefined')\n",
+ " \n",
+ " self.gamma = gamma\n",
+ " self.rewards = rewards\n",
+ "\n",
+ " def remove_dominated_plans(self, input_values):\n",
+ " """\n",
+ " Remove dominated plans.\n",
+ " This method finds all the lines contributing to the\n",
+ " upper surface and removes those which don't.\n",
+ " """\n",
+ "\n",
+ " values = [val for action in input_values for val in input_values[action]]\n",
+ " values.sort(key=lambda x: x[0], reverse=True)\n",
+ "\n",
+ " best = [values[0]]\n",
+ " y1_max = max(val[1] for val in values)\n",
+ " tgt = values[0]\n",
+ " prev_b = 0\n",
+ " prev_ix = 0\n",
+ " while tgt[1] != y1_max:\n",
+ " min_b = 1\n",
+ " min_ix = 0\n",
+ " for i in range(prev_ix + 1, len(values)):\n",
+ " if values[i][0] - tgt[0] + tgt[1] - values[i][1] != 0:\n",
+ " trans_b = (values[i][0] - tgt[0]) / (values[i][0] - tgt[0] + tgt[1] - values[i][1])\n",
+ " if 0 <= trans_b <= 1 and trans_b > prev_b and trans_b < min_b:\n",
+ " min_b = trans_b\n",
+ " min_ix = i\n",
+ " prev_b = min_b\n",
+ " prev_ix = min_ix\n",
+ " tgt = values[min_ix]\n",
+ " best.append(tgt)\n",
+ "\n",
+ " return self.generate_mapping(best, input_values)\n",
+ "\n",
+ " def remove_dominated_plans_fast(self, input_values):\n",
+ " """\n",
+ " Remove dominated plans using approximations.\n",
+ " Resamples the upper boundary at intervals of 100 and\n",
+ " finds the maximum values at these points.\n",
+ " """\n",
+ "\n",
+ " values = [val for action in input_values for val in input_values[action]]\n",
+ " values.sort(key=lambda x: x[0], reverse=True)\n",
+ "\n",
+ " best = []\n",
+ " sr = 100\n",
+ " for i in range(sr + 1):\n",
+ " x = i / float(sr)\n",
+ " maximum = (values[0][1] - values[0][0]) * x + values[0][0]\n",
+ " tgt = values[0]\n",
+ " for value in values:\n",
+ " val = (value[1] - value[0]) * x + value[0]\n",
+ " if val > maximum:\n",
+ " maximum = val\n",
+ " tgt = value\n",
+ "\n",
+ " if all(any(tgt != v) for v in best):\n",
+ " best.append(tgt)\n",
+ "\n",
+ " return self.generate_mapping(best, input_values)\n",
+ "\n",
+ " def generate_mapping(self, best, input_values):\n",
+ " """Generate mappings after removing dominated plans"""\n",
+ "\n",
+ " mapping = defaultdict(list)\n",
+ " for value in best:\n",
+ " for action in input_values:\n",
+ " if any(all(value == v) for v in input_values[action]):\n",
+ " mapping[action].append(value)\n",
+ "\n",
+ " return mapping\n",
+ "\n",
+ " def max_difference(self, U1, U2):\n",
+ " """Find maximum difference between two utility mappings"""\n",
+ "\n",
+ " for k, v in U1.items():\n",
+ " sum1 = 0\n",
+ " for element in U1[k]:\n",
+ " sum1 += sum(element)\n",
+ " sum2 = 0\n",
+ " for element in U2[k]:\n",
+ " sum2 += sum(element)\n",
+ " return abs(sum1 - sum2)\n",
+ "
def pomdp_value_iteration(pomdp, epsilon=0.1):\n",
+ " """Solving a POMDP by value iteration."""\n",
+ "\n",
+ " U = {'':[[0]* len(pomdp.states)]}\n",
+ " count = 0\n",
+ " while True:\n",
+ " count += 1\n",
+ " prev_U = U\n",
+ " values = [val for action in U for val in U[action]]\n",
+ " value_matxs = []\n",
+ " for i in values:\n",
+ " for j in values:\n",
+ " value_matxs.append([i, j])\n",
+ "\n",
+ " U1 = defaultdict(list)\n",
+ " for action in pomdp.actions:\n",
+ " for u in value_matxs:\n",
+ " u1 = Matrix.matmul(Matrix.matmul(pomdp.t_prob[int(action)], Matrix.multiply(pomdp.e_prob[int(action)], Matrix.transpose(u))), [[1], [1]])\n",
+ " u1 = Matrix.add(Matrix.scalar_multiply(pomdp.gamma, Matrix.transpose(u1)), [pomdp.rewards[int(action)]])\n",
+ " U1[action].append(u1[0])\n",
+ "\n",
+ " U = pomdp.remove_dominated_plans_fast(U1)\n",
+ " # replace with U = pomdp.remove_dominated_plans(U1) for accurate calculations\n",
+ " \n",
+ " if count > 10:\n",
+ " if pomdp.max_difference(U, prev_U) < epsilon * (1 - pomdp.gamma) / pomdp.gamma:\n",
+ " return U\n",
+ "
class PDDL:\n",
+ "class PlanningProblem:\n",
" """\n",
- " Planning Domain Definition Language (PDDL) used to define a search problem.\n",
+ " Planning Domain Definition Language (PlanningProblem) used to define a search problem.\n",
" It stores states in a knowledge base consisting of first order logic statements.\n",
" The conjunction of these logical statements completely defines a state.\n",
" """\n",
"\n",
" def __init__(self, init, goals, actions):\n",
" self.init = self.convert(init)\n",
- " self.goals = expr(goals)\n",
+ " self.goals = self.convert(goals)\n",
" self.actions = actions\n",
"\n",
- " def convert(self, init):\n",
+ " def convert(self, clauses):\n",
" """Converts strings into exprs"""\n",
+ " if not isinstance(clauses, Expr):\n",
+ " if len(clauses) > 0:\n",
+ " clauses = expr(clauses)\n",
+ " else:\n",
+ " clauses = []\n",
" try:\n",
- " init = conjuncts(expr(init))\n",
+ " clauses = conjuncts(clauses)\n",
" except AttributeError:\n",
- " init = expr(init)\n",
- " return init\n",
+ " clauses = clauses\n",
+ "\n",
+ " new_clauses = []\n",
+ " for clause in clauses:\n",
+ " if clause.op == '~':\n",
+ " new_clauses.append(expr('Not' + str(clause.args[0])))\n",
+ " else:\n",
+ " new_clauses.append(clause)\n",
+ " return new_clauses\n",
"\n",
" def goal_test(self):\n",
" """Checks if the goals have been reached"""\n",
- " return all(goal in self.init for goal in conjuncts(self.goals))\n",
+ " return all(goal in self.init for goal in self.goals)\n",
"\n",
" def act(self, action):\n",
" """\n",
@@ -215,7 +242,7 @@
}
],
"source": [
- "psource(PDDL)"
+ "psource(PlanningProblem)"
]
},
{
@@ -350,7 +377,7 @@
"class Action:\n",
" """\n",
" Defines an action schema using preconditions and effects.\n",
- " Use this to describe actions in PDDL.\n",
+ " Use this to describe actions in PlanningProblem.\n",
" action is an Expr where variables are given as arguments(args).\n",
" Precondition and effect are both lists with positive and negative literals.\n",
" Negative preconditions and effects are defined by adding a 'Not' before the name of the clause\n",
@@ -361,34 +388,38 @@
" """\n",
"\n",
" def __init__(self, action, precond, effect):\n",
- " action = expr(action)\n",
+ " if isinstance(action, str):\n",
+ " action = expr(action)\n",
" self.name = action.op\n",
" self.args = action.args\n",
- " self.precond, self.effect = self.convert(precond, effect)\n",
+ " self.precond = self.convert(precond)\n",
+ " self.effect = self.convert(effect)\n",
"\n",
" def __call__(self, kb, args):\n",
" return self.act(kb, args)\n",
"\n",
- " def convert(self, precond, effect):\n",
+ " def __repr__(self):\n",
+ " return '{}({})'.format(self.__class__.__name__, Expr(self.name, *self.args))\n",
+ "\n",
+ " def convert(self, clauses):\n",
" """Converts strings into Exprs"""\n",
+ " if isinstance(clauses, Expr):\n",
+ " clauses = conjuncts(clauses)\n",
+ " for i in range(len(clauses)):\n",
+ " if clauses[i].op == '~':\n",
+ " clauses[i] = expr('Not' + str(clauses[i].args[0]))\n",
"\n",
- " precond = precond.replace('~', 'Not')\n",
- " if len(precond) > 0:\n",
- " precond = expr(precond)\n",
- " effect = effect.replace('~', 'Not')\n",
- " if len(effect) > 0:\n",
- " effect = expr(effect)\n",
+ " elif isinstance(clauses, str):\n",
+ " clauses = clauses.replace('~', 'Not')\n",
+ " if len(clauses) > 0:\n",
+ " clauses = expr(clauses)\n",
"\n",
- " try:\n",
- " precond = conjuncts(precond)\n",
- " except AttributeError:\n",
- " pass\n",
- " try:\n",
- " effect = conjuncts(effect)\n",
- " except AttributeError:\n",
- " pass\n",
+ " try:\n",
+ " clauses = conjuncts(clauses)\n",
+ " except AttributeError:\n",
+ " pass\n",
"\n",
- " return precond, effect\n",
+ " return clauses\n",
"\n",
" def substitute(self, e, args):\n",
" """Replaces variables in expression with their respective Propositional symbol"""\n",
@@ -405,7 +436,6 @@
"\n",
" if isinstance(kb, list):\n",
" kb = FolKB(kb)\n",
- "\n",
" for clause in self.precond:\n",
" if self.substitute(clause, args) not in kb.clauses:\n",
" return False\n",
@@ -676,7 +706,7 @@
},
"outputs": [],
"source": [
- "prob = PDDL(knowledge_base, goals, [fly_s_b, fly_b_s, fly_s_c, fly_c_s, fly_b_c, fly_c_b, drive])"
+ "prob = PlanningProblem(knowledge_base, goals, [fly_s_b, fly_b_s, fly_s_c, fly_c_s, fly_b_c, fly_c_b, drive])"
]
},
{
@@ -793,12 +823,34 @@
"\n",
"\n",
"def air_cargo():\n",
- " """Air cargo problem"""\n",
+ " """\n",
+ " [Figure 10.1] AIR-CARGO-PROBLEM\n",
+ "\n",
+ " An air-cargo shipment problem for delivering cargo to different locations,\n",
+ " given the starting location and airplanes.\n",
+ "\n",
+ " Example:\n",
+ " >>> from planning import *\n",
+ " >>> ac = air_cargo()\n",
+ " >>> ac.goal_test()\n",
+ " False\n",
+ " >>> ac.act(expr('Load(C2, P2, JFK)'))\n",
+ " >>> ac.act(expr('Load(C1, P1, SFO)'))\n",
+ " >>> ac.act(expr('Fly(P1, SFO, JFK)'))\n",
+ " >>> ac.act(expr('Fly(P2, JFK, SFO)'))\n",
+ " >>> ac.act(expr('Unload(C2, P2, SFO)'))\n",
+ " >>> ac.goal_test()\n",
+ " False\n",
+ " >>> ac.act(expr('Unload(C1, P1, JFK)'))\n",
+ " >>> ac.goal_test()\n",
+ " True\n",
+ " >>>\n",
+ " """\n",
"\n",
- " return PDDL(init='At(C1, SFO) & At(C2, JFK) & At(P1, SFO) & At(P2, JFK) & Cargo(C1) & Cargo(C2) & Plane(P1) & Plane(P2) & Airport(SFO) & Airport(JFK)',\n",
- " goals='At(C1, JFK) & At(C2, SFO)', \n",
+ " return PlanningProblem(init='At(C1, SFO) & At(C2, JFK) & At(P1, SFO) & At(P2, JFK) & Cargo(C1) & Cargo(C2) & Plane(P1) & Plane(P2) & Airport(SFO) & Airport(JFK)', \n",
+ " goals='At(C1, JFK) & At(C2, SFO)',\n",
" actions=[Action('Load(c, p, a)', \n",
- " precond='At(c, a) & At(p, a) & Cargo(c) & Plane(p) & Airport(a)', \n",
+ " precond='At(c, a) & At(p, a) & Cargo(c) & Plane(p) & Airport(a)',\n",
" effect='In(c, p) & ~At(c, a)'),\n",
" Action('Unload(c, p, a)',\n",
" precond='In(c, p) & At(p, a) & Cargo(c) & Plane(p) & Airport(a)',\n",
@@ -886,7 +938,7 @@
"metadata": {},
"source": [
"It returns False because the goal state is not yet reached. Now, we define the sequence of actions that it should take in order to achieve the goal.\n",
- "The actions are then carried out on the `airCargo` PDDL.\n",
+ "The actions are then carried out on the `airCargo` PlanningProblem.\n",
"\n",
"The actions available to us are the following: Load, Unload, Fly\n",
"\n",
@@ -1060,9 +1112,27 @@
"\n",
"\n",
"def spare_tire():\n",
- " """Spare tire problem"""\n",
+ " """[Figure 10.2] SPARE-TIRE-PROBLEM\n",
+ "\n",
+ " A problem involving changing the flat tire of a car\n",
+ " with a spare tire from the trunk.\n",
+ "\n",
+ " Example:\n",
+ " >>> from planning import *\n",
+ " >>> st = spare_tire()\n",
+ " >>> st.goal_test()\n",
+ " False\n",
+ " >>> st.act(expr('Remove(Spare, Trunk)'))\n",
+ " >>> st.act(expr('Remove(Flat, Axle)'))\n",
+ " >>> st.goal_test()\n",
+ " False\n",
+ " >>> st.act(expr('PutOn(Spare, Axle)'))\n",
+ " >>> st.goal_test()\n",
+ " True\n",
+ " >>>\n",
+ " """\n",
"\n",
- " return PDDL(init='Tire(Flat) & Tire(Spare) & At(Flat, Axle) & At(Spare, Trunk)',\n",
+ " return PlanningProblem(init='Tire(Flat) & Tire(Spare) & At(Flat, Axle) & At(Spare, Trunk)',\n",
" goals='At(Spare, Axle) & At(Flat, Ground)',\n",
" actions=[Action('Remove(obj, loc)',\n",
" precond='At(obj, loc)',\n",
@@ -1144,7 +1214,7 @@
"source": [
"As we can see, it hasn't completed the goal. \n",
"We now define a possible solution that can help us reach the goal of having a spare tire mounted onto the car's axle. \n",
- "The actions are then carried out on the `spareTire` PDDL.\n",
+ "The actions are then carried out on the `spareTire` PlanningProblem.\n",
"\n",
"The actions available to us are the following: Remove, PutOn\n",
"\n",
@@ -1369,9 +1439,28 @@
"\n",
"\n",
"def three_block_tower():\n",
- " """Sussman Anomaly problem"""\n",
+ " """\n",
+ " [Figure 10.3] THREE-BLOCK-TOWER\n",
+ "\n",
+ " A blocks-world problem of stacking three blocks in a certain configuration,\n",
+ " also known as the Sussman Anomaly.\n",
"\n",
- " return PDDL(init='On(A, Table) & On(B, Table) & On(C, A) & Block(A) & Block(B) & Block(C) & Clear(B) & Clear(C)',\n",
+ " Example:\n",
+ " >>> from planning import *\n",
+ " >>> tbt = three_block_tower()\n",
+ " >>> tbt.goal_test()\n",
+ " False\n",
+ " >>> tbt.act(expr('MoveToTable(C, A)'))\n",
+ " >>> tbt.act(expr('Move(B, Table, C)'))\n",
+ " >>> tbt.goal_test()\n",
+ " False\n",
+ " >>> tbt.act(expr('Move(A, Table, B)'))\n",
+ " >>> tbt.goal_test()\n",
+ " True\n",
+ " >>>\n",
+ " """\n",
+ "\n",
+ " return PlanningProblem(init='On(A, Table) & On(B, Table) & On(C, A) & Block(A) & Block(B) & Block(C) & Clear(B) & Clear(C)',\n",
" goals='On(A, B) & On(B, C)',\n",
" actions=[Action('Move(b, x, y)',\n",
" precond='On(b, x) & Clear(b) & Clear(y) & Block(b) & Block(y)',\n",
@@ -1453,7 +1542,7 @@
"source": [
"As we can see, it hasn't completed the goal. \n",
"We now define a sequence of actions that can stack three blocks in the required order. \n",
- "The actions are then carried out on the `threeBlockTower` PDDL.\n",
+ "The actions are then carried out on the `threeBlockTower` PlanningProblem.\n",
"\n",
"The actions available to us are the following: MoveToTable, Move\n",
"\n",
@@ -1513,16 +1602,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Shopping Problem"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "This problem requires us to acquire a carton of milk, a banana and a drill.\n",
- "Initially, we start from home and it is known to us that milk and bananas are available in the supermarket and the hardware store sells drills.\n",
- "Let's take a look at the definition of the `shopping_problem` in the module."
+ "The `three_block_tower` problem can also be defined in simpler terms using just two actions `ToTable(x, y)` and `FromTable(x, y)`.\n",
+ "The underlying problem remains the same however, stacking up three blocks in a certain configuration given a particular starting state.\n",
+ "Let's have a look at the alternative definition."
]
},
{
@@ -1619,17 +1701,35 @@
"\n",
"\n",
"\n",
- "def shopping_problem():\n",
- " """Shopping problem"""\n",
+ "def simple_blocks_world():\n",
+ " """\n",
+ " SIMPLE-BLOCKS-WORLD\n",
"\n",
- " return PDDL(init='At(Home) & Sells(SM, Milk) & Sells(SM, Banana) & Sells(HW, Drill)',\n",
- " goals='Have(Milk) & Have(Banana) & Have(Drill)', \n",
- " actions=[Action('Buy(x, store)',\n",
- " precond='At(store) & Sells(store, x)',\n",
- " effect='Have(x)'),\n",
- " Action('Go(x, y)',\n",
- " precond='At(x)',\n",
- " effect='At(y) & ~At(x)')])\n",
+ " A simplified definition of the Sussman Anomaly problem.\n",
+ "\n",
+ " Example:\n",
+ " >>> from planning import *\n",
+ " >>> sbw = simple_blocks_world()\n",
+ " >>> sbw.goal_test()\n",
+ " False\n",
+ " >>> sbw.act(expr('ToTable(A, B)'))\n",
+ " >>> sbw.act(expr('FromTable(B, A)'))\n",
+ " >>> sbw.goal_test()\n",
+ " False\n",
+ " >>> sbw.act(expr('FromTable(C, B)'))\n",
+ " >>> sbw.goal_test()\n",
+ " True\n",
+ " >>>\n",
+ " """\n",
+ "\n",
+ " return PlanningProblem(init='On(A, B) & Clear(A) & OnTable(B) & OnTable(C) & Clear(C)',\n",
+ " goals='On(B, A) & On(C, B)',\n",
+ " actions=[Action('ToTable(x, y)',\n",
+ " precond='On(x, y) & Clear(x)',\n",
+ " effect='~On(x, y) & Clear(y) & OnTable(x)'),\n",
+ " Action('FromTable(y, x)',\n",
+ " precond='OnTable(y) & Clear(y) & Clear(x)',\n",
+ " effect='~OnTable(y) & ~Clear(x) & On(y, x)')])\n",
"
\n",
"\n",
"\n"
@@ -1643,20 +1743,26 @@
}
],
"source": [
- "psource(shopping_problem)"
+ "psource(simple_blocks_world)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "**At(x):** Indicates that we are currently at **'x'** where **'x'** can be Home, SM (supermarket) or HW (Hardware store).\n",
+ "**On(x, y):** The block **'x'** is on **'y'**. Both **'x'** and **'y'** have to be blocks.\n",
"\n",
- "**~At(x):** Indicates that we are currently _not_ at **'x'**.\n",
+ "**~On(x, y):** The block **'x'** is _not_ on **'y'**. Both **'x'** and **'y'** have to be blocks.\n",
"\n",
- "**Sells(s, x):** Indicates that item **'x'** can be bought from store **'s'**.\n",
+ "**OnTable(x):** The block **'x'** is on the table.\n",
"\n",
- "**Have(x):** Indicates that we possess the item **'x'**."
+ "**~OnTable(x):** The block **'x'** is _not_ on the table.\n",
+ "\n",
+ "**Clear(x):** To indicate that there is nothing on **'x'** and it is free to be moved around.\n",
+ "\n",
+ "**~Clear(x):** To indicate that there is something on **'x'** and it cannot be moved.\n",
+ "\n",
+ "Let's now define a `simple_blocks_world` prolem."
]
},
{
@@ -1667,14 +1773,14 @@
},
"outputs": [],
"source": [
- "shoppingProblem = shopping_problem()"
+ "simpleBlocksWorld = simple_blocks_world()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's first check whether the goal state Have(Milk), Have(Banana), Have(Drill) is reached or not."
+ "Before taking any actions, we will see if `simple_bw` has reached its goal."
]
},
{
@@ -1683,34 +1789,33 @@
"metadata": {},
"outputs": [
{
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "False\n"
- ]
+ "data": {
+ "text/plain": [
+ "False"
+ ]
+ },
+ "execution_count": 31,
+ "metadata": {},
+ "output_type": "execute_result"
}
],
"source": [
- "print(shoppingProblem.goal_test())"
+ "simpleBlocksWorld.goal_test()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "Let's look at the possible actions\n",
+ "As we can see, it hasn't completed the goal. \n",
+ "We now define a sequence of actions that can stack three blocks in the required order. \n",
+ "The actions are then carried out on the `simple_bw` PlanningProblem.\n",
"\n",
- "**Buy(x, store):** Buy an item **'x'** from a **'store'** given that the **'store'** sells **'x'**.\n",
+ "The actions available to us are the following: MoveToTable, Move\n",
"\n",
- "**Go(x, y):** Go to destination **'y'** starting from source **'x'**."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {},
- "source": [
- "We now define a valid solution that will help us reach the goal.\n",
- "The sequence of actions will then be carried out onto the `shoppingProblem` PDDL."
+ "**ToTable(x, y): ** Move box **'x'** stacked on **'y'** to the table, given that box **'y'** is clear.\n",
+ "\n",
+ "**FromTable(x, y): ** Move box **'x'** from wherever it is, to the top of **'y'**, given that both **'x'** and **'y'** are clear.\n"
]
},
{
@@ -1721,22 +1826,19 @@
},
"outputs": [],
"source": [
- "solution = [expr('Go(Home, SM)'),\n",
- " expr('Buy(Milk, SM)'),\n",
- " expr('Buy(Banana, SM)'),\n",
- " expr('Go(SM, HW)'),\n",
- " expr('Buy(Drill, HW)')]\n",
+ "solution = [expr('ToTable(A, B)'),\n",
+ " expr('FromTable(B, A)'),\n",
+ " expr('FromTable(C, B)')]\n",
"\n",
"for action in solution:\n",
- " shoppingProblem.act(action)"
+ " simpleBlocksWorld.act(action)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "We have taken the steps required to acquire all the stuff we need. \n",
- "Let's see if we have reached our goal."
+ "As the `three_block_tower` has taken all the steps it needed in order to achieve the goal, we can now check if it has acheived its goal."
]
},
{
@@ -1745,40 +1847,38 @@
"metadata": {},
"outputs": [
{
- "data": {
- "text/plain": [
- "True"
- ]
- },
- "execution_count": 33,
- "metadata": {},
- "output_type": "execute_result"
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "True\n"
+ ]
}
],
"source": [
- "shoppingProblem.goal_test()"
+ "print(simpleBlocksWorld.goal_test())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "It has now successfully achieved the goal."
+ "It has now successfully achieved its goal i.e, to build a stack of three blocks in the specified order."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "## Have Cake and Eat Cake Too"
+ "## Shopping Problem"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
- "This problem requires us to reach the state of having a cake and having eaten a cake simlutaneously, given a single cake.\n",
- "Let's first take a look at the definition of the `have_cake_and_eat_cake_too` problem in the module."
+ "This problem requires us to acquire a carton of milk, a banana and a drill.\n",
+ "Initially, we start from home and it is known to us that milk and bananas are available in the supermarket and the hardware store sells drills.\n",
+ "Let's take a look at the definition of the `shopping_problem` in the module."
]
},
{
@@ -1875,17 +1975,37 @@
"\n",
"\n",
"\n",
- "def have_cake_and_eat_cake_too():\n",
- " """Cake problem"""\n",
+ "def shopping_problem():\n",
+ " """\n",
+ " SHOPPING-PROBLEM\n",
"\n",
- " return PDDL(init='Have(Cake)',\n",
- " goals='Have(Cake) & Eaten(Cake)',\n",
- " actions=[Action('Eat(Cake)',\n",
- " precond='Have(Cake)',\n",
- " effect='Eaten(Cake) & ~Have(Cake)'),\n",
- " Action('Bake(Cake)',\n",
- " precond='~Have(Cake)',\n",
- " effect='Have(Cake)')])\n",
+ " A problem of acquiring some items given their availability at certain stores.\n",
+ "\n",
+ " Example:\n",
+ " >>> from planning import *\n",
+ " >>> sp = shopping_problem()\n",
+ " >>> sp.goal_test()\n",
+ " False\n",
+ " >>> sp.act(expr('Go(Home, HW)'))\n",
+ " >>> sp.act(expr('Buy(Drill, HW)'))\n",
+ " >>> sp.act(expr('Go(HW, SM)'))\n",
+ " >>> sp.act(expr('Buy(Banana, SM)'))\n",
+ " >>> sp.goal_test()\n",
+ " False\n",
+ " >>> sp.act(expr('Buy(Milk, SM)'))\n",
+ " >>> sp.goal_test()\n",
+ " True\n",
+ " >>>\n",
+ " """\n",
+ "\n",
+ " return PlanningProblem(init='At(Home) & Sells(SM, Milk) & Sells(SM, Banana) & Sells(HW, Drill)',\n",
+ " goals='Have(Milk) & Have(Banana) & Have(Drill)', \n",
+ " actions=[Action('Buy(x, store)',\n",
+ " precond='At(store) & Sells(store, x)',\n",
+ " effect='Have(x)'),\n",
+ " Action('Go(x, y)',\n",
+ " precond='At(x)',\n",
+ " effect='At(y) & ~At(x)')])\n",
"
\n",
"\n",
"