AI Notes
AI Notes
1.1 Introduction
Artificial: Made by humans produced rather than natural.
Intelligence: The ability to acquire and apply knowledge and skills.
Definition: Artificial Intelligence
Artificial intelligence is the branch of computer science concerned with making computers
behave like humans. The term was coined in 1956 by John McCarthy at the Massachusetts Institute
of Technology.
AI includes the following areas of specialization.
Game Playing: Programming computers to play games against human opponents.
Expert Systems: Programming computers to make decisions in real-life situations (for
example, some expert systems help doctors diagnose diseases based on symptoms).
Natural Language: Programming computers to understand natural human languages.
Neural Networks: That simulates intelligence by attempting to reproduce the types of physical
connections that occur in animal brains.
Robotics: Programming computers to see and hear and react to other sensory stimuli.
1.1.1 What is AI?
Views of AI fall into four categories:
Table 1.1: Some definition of artificial intelligence, organized into four categories
Thinking humanly Thinking rationally
“The automation of activities that we “ The study of mental faculties through
associate with human thinking, activities such the use of computational models”
as decision-making, problem solving,
learning…”
Acting humanly Acting rationally
“The study of how to make computers do “Computational intelligence is the study
things at which, at the moment, people are of the design of intelligent agents”
better”
1.1.2Acting humanly: Turing Test
The Turing Test,proposed by Alan Turing (1950).
A computer passes the test if a human interrogator, after posing some written questions,
cannot tell whether the written responses come from a person or from a computer.
Three rooms contain a person, a computer, and an interrogator.The interrogator can
communicate with the other two by teleprinter.
The interrogator tries to determine which the person is and which the machine is. The
machine tries to fool the interrogator into believing that it is the person. If the machine
succeeds, then we conclude that the machine can think.
1
Autonomous planning and scheduling
o Route planning
o Automated scheduling of actions in spacecrafts
Game playing
o IBM's Deep Blue defeated G.Kasparov (the human world champion) (1997)
o The program FRITZ running on an ordinary PC drawed with V.Kramnik (the human world
champion) (2002)
Autonomous control
o Automated car steering and the Mars mission.
Diagnosis
o Medical diagnosis programs based on probabilistic analysis have been able to perform at
the level of an expert physician in several areas of medicine.
Logistic planning
o Defense Advanced Research Project Agency stated that this single application more than
paid back DARPA's 30-year investment in AI
Robotics
o Microsurgery and Robocop. By the year 2050, develop a team of fully autonomous
humanoid robots that can win against the human world soccer champion team.
1.2 Intelligent Agents
Agents and Environments
An agent is anything that can be viewed as perceiving its environment through sensors and
acting upon that environment through actuators.
Human agent: eyes, ears, and other organs for sensors; hands, legs, mouth, and other body
parts for actuators.
Robotic agent: cameras and infrared range finders for sensors; various motors for actuators.
A software agent receives keystrokes, file inputs and acts on the environment by displaying
on the screen, writing files, and sending network packets.
An agent function is an abstract mathematical description; the agent program is a concrete
implementation, running on the agent architecture.
Fig. 1.2: Agents interact with environments through sensors and actuators.
Example: Vacuum-cleaner
The vacuum agent perceives which square it is in and whether there is dirt in the square.
It can choose to move left, move right, suck up the dirt, or do nothing.
One simple agent function is the following: If the current square is dirty, then suck,
otherwise move to the square.
3
Sensors: Keyboard
Properties of task environment
Fully observable (vs. partially observable): An agent's sensors give it access to the complete
state of the environment at each point in time.
Deterministic (vs. stochastic): The next state of the environment is completely determined by
the current state and the action executed by the agent. (If the environment is deterministic
except for the actions of other agents, then the environment is strategic)
Episodic (vs. sequential): The agent's experience is divided into atomic "episodes" (each
episode consists of the agent perceiving and then performing a single action), and the choice
of action in each episode depends only on the episode itself.
Static (vs. dynamic): The environment is unchanged while an agent is deliberating. (The
environment is semi dynamic if the environment itself does not change with the passage of
time but the agent's performance score does)
Discrete (vs. continuous): A limited number of distinct, clearly defined percepts and actions.
Single agent (vs. multiagent): An agent operating by itself in an environment.
The environment type largely determines the agent design
The real world is (of course) partially observable, stochastic, sequential, dynamic, continuous,
multi-agent
1.3 The Structure of Agents
The job of AI is to design the agent program that implements the agent function mapping
percepts to actions.
We assume this program will run on some sort of computing device with physical sensors and
actuators.
Agent = Architecture + Program
An agent is completely specified by the agent function mapping percept sequences to actions.
One agent function (or a small equivalence class) is rational.
Aim: find a way to implement the rational agent function concisely.
Table-lookup agent
Drawbacks
Huge table
Take a long time to build the table
No autonomy
Even with learning, need a long time to learn the table entries
Fig. 1.5: The TABLE-DRIVEN-AGENT program is invoked for each new percept and returns an
action each time. It keeps track of the percept sequence using its own private data structure.
Fig. 1.6: The agent program for a simple reflex agent in the two-state vacuum environment.
This program implements the agent function.
1.4 Agent Types
Four basic types in order of increasing generality:
1. Simple reflex agents
2. Model-based reflex agents
3. Goal-based agents
4
4. Utility-based agents
1.4.1 Simple Reflex Agents
These agents select actions on the basis of the current percept, ignoring the rest of the
percept history.
The vacuum agent program is very small. But some processing is done on the visual input to
establish the condition-action rule.
For eg., if car-in-front-is-braking then initiate braking.
The following figure shows how the condition-Action rules allow the agent to make the
connection from percept to action.
Fig. 1.8: A simple reflex agent. It acts according to a rule whose condition matches the current
state, as defined by the percept.
Function
Interpret-Input: generates an abstracted description of the current state from the percept.
Rule-Match: returns the first rule in the set of rules that matches the given state description.
This agent will work only if the correct decision can be made on the basis of only the current
percept. i.e. only if the environment is fully observable.
1.4.2 Model-Based Reflex Agents
To handle partial observability, the agent should maintain some sort of internal state that
depends on the percept history and thereby reflects at least some of the unobserved aspects
of the current state.
Updating this internal state information requires two kinds of knowledge to be encoded in the
agent program.
o How the world evolves independently of the agent?
o How the agent’s actions affect the world?
This knowledge can be implemented in simple Boolean circuits called model of the world. An
agent that uses such a model is called a model-based agent.
The following figure shows the structure of the reflex agent with internal state, showing how
the current percept is combined with the old internal state to generate the updated description
of the current state.
5
Fig 1.9: A model-based reflex agent.
The agent program is shown below:
Fig. 1.10: A model based reflex agent. It keeps track of the current state of the world using an
internal model. If then chooses an action in same way as the reflex agent.
UPDATE-STATE: for creating the new internal state description.
1.4.3 Goal-Based Agents
Here, along with current-state description, the agent needs some sort of goal information that
describes situations that are desirable – for e.g., being at the passenger’s destination. Goal –
based agents structure is shown below:
Fig. 1.11: A model based goal based agent. It keeps track of the world state as well as a set of
the world state as well.
1.4.4 Utility-Based Agents
Goals alone are not enough to generate high-quality behavior in most environments.
A more general performance measure should allow a comparison of different world states
according to exactly how happy they would make the agent if they could be achieved.
A utility function maps a state onto a real number, which describes the associated degree of
happiness. The utility-based agent structure appears in the following figure.
6
The critic tells the learning element how well the agent is doing with respect to a fixed
performance standard.
7
Fig 1.14: Problem solving agent.
Example: Romania
On holiday in Romania; currently in Arad.
Flight leaves tomorrow from Bucharest
Formulate goal: be in Bucharest
Formulate problem:
States: various cities
Actions: drive between cities
Find solution: sequence of cities,
e.g., Arad, Sibiu, Fagaras, Bucharest
8
Fig 1.16: 8-puzzle problem.
Toy Problems
Example: we will examine is the vacuum world. This can be formulated as a problem as
follows.
States: The agent is in one of two locations, each of which might or might not contain dirt.
Thus there are 2*22 = 8 possible world states.
Initial State: Any state can be designated as the initial state.
Successor function: This generates the legal states that result from trying the three actions
(Left, Right, and suck).
Goal test: This checks whether all the squares are clean.
9
Assign a variable Ri (i=1 to N) to the queen in the ith column indicating the position of queen
in the row.
Apply “no-threatening” constraints between each couple Ri and Rj of the queens and evaluate
the algorithm.
Fig. 1.19: Partial search trees for finding a route from Arad to Bucharest. Nodes that have been
expanded are shaded; nodes that have been generated but not yet expanded are outlined in
bold; nodes that have not been generated are shown in faint dashed lines.
The choice of which state to expand is determined by the search strategy. The general tree-
search algorithm is given below:
10
Fig. 1.20: An informal description of the general tree-search algorithm
Assume that a node is a data structure with five components:
STATE: the state in the state space to which the node corresponds
PARENT-NODE: the node in the search tree that generated this node
ACTION: the action that was applied to the parent to generate the node
PATH-COST: the cost, traditionally denoted by g(n), of the path from the initial state to the
node, as indicated by the parent pointers; and
DEPTH: the number of steps along the path from the initial state.
Implementation: states vs. nodes
A state is a (representation of) a physical configuration
A node is a data structure constituting part of a search tree includes state, parent node,
action, path cost g(x), depth
The node data structure is depicted in the following figure:
Fig. 1.21: Nodes are the data structures from which the search tree is constructed. Each has a
parent, a state, and various bookkeeping fields. Arrow point from child to parent.
The collection of nodes is implemented as a queue. The operations on a queue are as follows:
MAKE-QUEUE(element….) creates a queue with the given element(s)
EMPTY?(queue)returns true only if there are no more elements in the queue
FIRST(queue) returns the first element of the queue
REMOVE-FIRST (queue) returns FIRST (queue) and removes it from the queue.
INSERT (element, queue) inserts an element into the queue and returns the resulting queue.
INSERT-ALL (elements, queue) inserts a set of elements into the queue and returns the
resulting queue.
With these definitions, the more formal version of the general tree search algorithm is shown below:
Fig. 1.22: The general tree search algorithm. (Note that the fringe argument must be an empty
queue, and the type of the queue will affect the order of the search.) The SOLUTION function
returns the sequence of actions obtained by following parent pointers back to the root.
11
Measuring problem-solving performance
The output of a problem solving algorithm is either failure or a solution. (Some algorithms
might get stuck in an infinite loop and never return an output.) We will evaluate an algorithm
performance in four ways.
Completeness: Is the algorithm guaranteed to find a solution when there is one?
Optimality: Does the strategy find the optimal solution?
Time complexity: How long does it takes to find a solution?
Space, complexity: How much memory is needed to perform the search?
Time and space complexity are measured in terms of
o b: maximum branching factor of the search tree
o d: depth of the least-cost solution
o m: maximum depth of the state space
Fig. 1.23: Breadth first search on a simple binary tree. At each state, the node to be expanded
next is indicated by a marker.
Properties of breadth-first search
Completeness? Yes (if b is finite)
o If the shallowest goal node is at some finite depth d, BFS will eventually find it after expanding
all shallower nodes (b is a branching factor)
Time complexity?1+b+b2+b3+… +bd + b(bd-1) = O(bd+1)
Space complexity?O(bd+1) (keeps every node in memory)
o We consider a hypothetical state space where every state has b successors. The root of the
search tree generates b nodes at the first level, each of which generates b more nodes, for a
total of b2 at the second level, and so on. Now suppose that the solution is at depth d.
Optimality? Yes (if cost = 1 per step)
12
o BFS is optimal if the path cost is a non decreasing function of the depth of the node.
o Space is the bigger problem (more than time).
Advantages of BFS
BFS will not get trapped exploring blind alley.
If there is a solution, then BFS is guaranteed to find it, further more if there are multiple solution
then minimal solution is found.
1.7.2 Uniform-cost search
BFS is optimal when all step costs are equal, because it always expands the shallowest
unexpanded node. Instead of expanding the shallowest node, Uniform-cost search expands the node
n with the lowest path cost.
Implementation
Fringe = queue ordered by path cost
Equivalent to breadth-first if step costs all equal
Completeness? Yes, if step cost >= (small positive constant)
Time complexity? # of nodes with g(goal node) <= cost of optimal solution, O(bceiling(C*/)) where
C* is the cost of the optimal solution
Space complexity? # of nodes with g<= cost of optimal solution, O(bceiling(C*/ ))
Optimality? Yes – nodes expanded in increasing order of g(n)
1.7.3 Depth-first search
Depth first search always expands the deepest node in the current fringe of the search tree.
The search proceeds immediately to the deepest level of the search tree, where the nodes
have no successors.
As those nodes are expanded, they are dropped from the fringe, so then the search “backs
up” to the next shallowest node that still has unexplored successors.
This strategy can be implemented by TREE-SEARCH with a last-in-first-out (LIFO) queue,
also known as stack.
The progress of the search is illustrated in the following figure:
Fig. 1.24: DFS on a binary tree. Nodes that have been expanded and have no descendants in
the fringe can be removed from memory; these are shown in black. Nodes at depth 3 are
assumed to have no successors and M is the only goal node.
Properties of depth-first search
Completeness? No: fails in infinite-depth spaces, spaces with loops
13
Modify to avoid repeated states along path complete in finite spaces
Time complexity?O(bm): terrible if m is much larger than d
but if solutions are dense, may be much faster than breadth-first
Space complexity?O(bm), i.e., linear space!
Optimality? No
Advantages of DFS
DFS requires less memory since only the nodes on the current path are stored.
By chance the DFS may find a solution without examining much of the search space at all.
1.7.4 Depth-Limited Search
The problem of unbounded trees can be alleviated by supplying DFS with a pre-determined
depth limit.
Depth-first search with depth limit l, i.e., nodes at depth l have no successors
Depth-limited search will also be non optimal if we choose l<d. Its time complexity is O(b l)
and its space complexity is O(bl).
Depth-limited search can terminate with two kinds of failure: the standard failure value
indicates no solution; the cutoff value indicates no solution within the depth limit.
Recursive implementation:
Fig.1.26: The iterative deepening search algorithm, which repeatedly applies depth-limited
search with increasing limits. It terminates when a solution is found or if the depth limited
search returns failure, meaning that no solution exists.
Iterative deepening combines the benefits of DFS and BFS. Like DFS, its memory
requirements are very modest:O(bd). Like BFS, it is complete when the branching factor is
finite and optimal when the path cost is a nondecreasing function of the depth of the node.
The following figure shows four iterations of ITERATIVE-DEEPENING SEARCH on a binary
search tree, where the solution is found on the fourth iteration.
14
Fig. 1.27: Four iterations of iterative deepening search on binary tree.
Properties of iterative deepening search
Completeness? Yes
Time Complexity?(d+1)b0 + d b1 + (d-1)b2 + … + bd = O(bd)
Space Complexity?O(bd)
Optimality? Yes, if step cost = 1
Summary of algorithms
Figure 1.28: Evaluation of search strategies, b is the branching factor; d is the depth of the
shallowest solution; m is the maximum depth of the search tree; l is the depth limit.
Superscript caveats are as follows: a complete if step costs>= € for positive €;c optimal if step
costs are all identical; d if both directions use breadth first search.
1.7.6 Bidirectional Search
15
The idea behind bi-directional search is to run two simultaneous searches – one forward from
the initial state and the other backward from the goal, stopping when the two searches meet
in the middle.
Bidirectional search is implemented by having one or both of the searches check each node
before it is expanded to see if it is in the fringe of the other search tree; if so, a solution has
been found.
Checking a node for membership in the other search tree can be done in constant time with a
hash table, so the time complexity of bi-directional search is O(bd/2).
At least one of the search trees must be kept in memory so that the membership check can
be done, hence the space complexity is O(bd/2) which is the weakness of the algorithm. The
algorithm is complete and optimal if both searches are breadth-first;
16
Fig. 1.29: Value of hSLD – Straight –line distances to Bucharest.
1.8.2 Greedy Best-First Search
Evaluation function f(n) = h(n) (heuristic)
F(n) = estimate of cost from n to goal
e.g., hSLD(n) = straight-line distance from n to Bucharest
Greedy best-first search expands the node that appears to be closest to goal
The first node to be expanded from Arad will be Sibiu, because it is closer to Bucharest than
either Zerind or Timisoara.
The next node to be expanded will be Fagaras, because it is closest. Fagaras in turn
generates Bucharest, which is the goal.
Greedy best-first search using hSLD finds a solution without ever expanding a node that is not
on the solution path; hence its search cost is minimal.
The progress of a greedy best-first search using h SLD to find a path from Arad to Bucharest is
shown in the following figure:
Properties of greedy best-first search
Completeness? No – can get stuck in loops, e.g., Iasi Neamt Iasi Neamt
Time Complexity?O(bm), but a good heuristic can give dramatic improvement
Space Complexity?O(bm) -- keeps all nodes in memory
Optimality? No
Fig. 1.30: Stages in a greedy best-first search for Bucharest using the straight-line distance
heuristic hSLD. Nodes are labeled with their h-values.
17
1.8.3 A* Search: Minimizing the Total Estimated Solution Cost
Idea: avoid expanding paths that are already expensive
Evaluation function f(n) = g(n) + h(n)
g(n) = cost so far to reach n
h(n) = estimated cost from n to goal
f(n) = estimated total cost of path through n to goal
Algorithm
Place the starting node s onopen.If open is empty, stop and return failure.
Remove from open the node n that has the smallest value of f *(n). If the node is a goal node,
return success and stop. Otherwise,
Expand n, generating all of its successors n’ and place n on closed. For every successor n’, if
n’ is not already on open or closed attach a back-pointer to n, compute f*(n’) and place it on
open.
Each n’ that is already on open or closed should be attached to back-pointers which reflect the
lowest g*( n’) path. If n’ was on closed and its pointer was changed, remove it and place it on
open.Return to step 2.The following figure shows an A* tree search for Bucharest.
Fig. 1.31: Stages in an A* search for Bucharest. Nodes are labeled with f = g+h. The h values
are the straight-line distances to Bucharest.
The optimality of A* is straightforward to analyze if it is used with TREE-SEARCH.
In this case, A* is optimal if h(n) is an admissible heuristic-that is, provided that h(n) never
overestimates the cost to reach the goal.
Suppose a suboptimal goal node G2 appears on the fringe, and let the cost of the optimal
solution be C*.
Then, because G2i s suboptimal and because h(G2=) 0 (true for any goal node), we know
o f (G2) = g(G2) + h(G2) = g(G2) > C* .
Now consider a fringe node n that is on an optimal solution path-for example, Pitesti in the
example of the preceding paragraph. (There must always be such a node if a solution exists.)
If h(n) does not overestimate the cost of completing the solution path, then we know that
o f (n) = g(n) + h(n) 5 C* .
Now we have shown that f (n) 5 C* < f (G2) so, G2 will not be expanded anti A* must return
an optimal solution.
Best first search is a simplified A*.
Start with OPEN holding the initial nodes.
Pick the BEST node on OPEN such that f = g + h' is minimal.
If BEST is goal node quit and return the path from initial to BESTOtherwise
Remove BEST from OPEN and all of BEST's children, labeling each with its path from initial
node.
1.8.4 Heuristic Functions
18
E.g., for the 8-puzzle:
h1(n) = number of misplaced tiles
h2(n) = the sum of the distances of the tiles from their goal positions. This is sometimes called
the city block distance or Manhattan distance
(i.e., no. of squares from desired location of each tile)
Fig. 1.32: A typical instance of the 8-puzzle. The solution is 26 steps long.
h1(S) = ?8, 8 tiles are out of position, so the start state would have h 1=8. h1 is an admissible
heuristic, because it is clear that any tile that is out of place must be moved at least once.
h2(S) = ? 3+1+2+2+2+3+3+2 = 18 .h2 is also admissible, because all any move can do is
move one tile one step closer to the goal.
Relaxed problems
A problem with fewer restrictions on the actions is called a relaxed problem.
The cost of an optimal solution to a relaxed problem is an admissible heuristic for the original
problem.
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1(n) gives the
shortest solution.
If the rules are relaxed so that a tile can move to any adjacent square, then h2(n) gives the
shortest solution.
1.8.5 Local Search Algorithms
In many optimization problems, the path to the goal is irrelevant; the goal state itself is the
solution.
State space = set of "complete" configurations.
Find configuration satisfying constraints, e.g., n-queens.
In such cases, we can use local search algorithms.
Keep a single "current" state, try to improve it.
Example: n-queens
Put n queens on an n × n board with no two queens on the same row, column, or diagonal
19
Fig. 1.34: A one dimensional state space landscape.
1.8.6 Hill-Climbing Search
"Like climbing Everest in thick fog with amnesia"
The hill-climbing search algorithm is shown in the following function. It is simply a loop that
continually moves in the direction of increasing value- that is, uphill. It terminates when it
reaches a “peak” where no neighbor has a higher value.
Fig. 1.35: The Hill-Climbing search algorithm (steepest ascent version), which is the most
basic local search technique. At each stage the current node is replaced by the best neighbor;
in this version, that means the neighbor with the highest VALUE, but if a heuristic cost
estimate h is used, we would find the neighbor with the lowest h.
Problem: depending on initial state, can get stuck in local maxima
1.8.7 Hill-climbing search: 8-queens problem
h = number of pairs of queens that are attacking each other, either directly or indirectly
h = 17 for the above state
A local minimum in the 8-queens state space; the state has h=1 but every successor has a
higher cost.
Hill climbing is sometimes called greedy local search because it grabs a good neighbor state
without thinking ahead about where to go next.
Hill climbing often gets stuck for the following reasons:
Local Maxima: a local maximum is a peak that is higher than each of its neighboring states,
but lower than the global maximum.
Ridges: Ridges result in a sequence of local maxima that is very difficult for greedy algorithms
to navigate.
Plateaux: a plateau is an area of the state space landscape where the evaluation funcion is
flat. It can be a flat local maximum, from which no uphill exit exists, or a shoulder, from which
it is possible to make progress.
20
Fig. 1.36: (a) An 8-queen state with heuristic cost estimate h=17, showing the value of h for
each possible successor obtained by moving a queen within its column. The best moves are
marked. (b) A local minimum in the 8-queen state space; the state has h=1 but every
successor has a right cost.
1.8.8 Simulated Annealing Search
Idea: escape local maxima by allowing some "bad" moves but gradually decrease their
frequency.
A hill climbing algorithm that never makes “downhill” moves towards states with lower value is
guaranteed to be incomplete, because it can get stuck on a local maximum.
In contrast, a purely random walk – that is, moving to a successor chosen uniformly at
random from the set of successors – is complete, but extremely inefficient. Simulated
annealing is the combination of hill climbing with a random walk. The innermost loop of the
simulated-annealing algorithm shown below is quite similar to hill climbing.
Instead of picking the best move, however, it picks a random move.
Properties of simulated annealing search
One can prove: If T decreases slowly enough, then simulated annealing search will find a
global optimum with probability approaching 1
Widely used in VLSI layout, airline scheduling, etc
Fig. 1.37: The simulated annealing search algorithm, a version of stochastic hill climbing
where some downhill moves are allowed. Downhill moves are accepted readily early in the
annealing schedule and then less often as time goes on. The schedule input determines the
value of T function of time.
1.8.9 Problem Reduction
When a problem can be divided into a set of sub problems, where each sub problem can be
solved separately and a combination of these will be a solution, AND-OR graphs or AND - OR
trees are used for representing the solution.
The decomposition of the problem or problem reduction generates AND arcs. One AND arc
May point to any number of successor nodes. All these must be solved so that the arc will rise
to many arcs, indicating several possible solutions. Hence the graph is known as AND - OR
instead of AND. Figure shows an AND - OR graph.
21
Fig. 1.38: shows AND – OR graph – an example.
An algorithm to find a solution in an AND - OR graph must handle AND area appropriately. A*
algorithm cannot search AND - OR graphs efficiently. This can be understood from the given
figure.
22
Fig.1.40: The Working of AO* algorithm.
The cost of getting from the start nodes to the current node "g" is not stored as in the A*
algorithm. This is because it is not possible to compute a single such value since there may be
many paths to the same state. In AO* algorithm serves as the estimate of goodness of a node.
Also a there should value called FUTILITY is used. The estimated cost of a solution is greater
than FUTILITY then the search is abandoned as too expensive to be practical.
1.9 AO* algorithm
Let G consists only to the node representing the initial state call this node INTT. Compute h'
(INIT).
Until INIT is labeled SOLVED or hi (INIT) becomes greater than FUTILITY, repeat the following
procedure.
o Trace the marked arcs from INIT and select an unbounded node NODE.
o Generate the successors of NODE. If there are no successors then assign FUTILITY as h'
(NODE). This means that NODE is not solvable. If there are successors then for each one
called SUCCESSOR, that is not also an ancestor of NODE do the following
(a) Add SUCCESSOR to graph G
(b) If successor is not a terminal node, mark it solved and assign zero to its h ' value.
(c) If successor is not a terminal node, compute it h' value.
o Propagate the newly discovered information up the graph by doing the following. let S be a
set of nodes that have been marked SOLVED. Initialize S to NODE. Until S is empty repeat
the following procedure;
(a) Select a node from S call if CURRENT and remove it from S.
(b) Compute h' of each of the arcs emerging from CURRENT, Assign minimum h' to
CURRENT.
(c) Mark the minimum cost path a s the best out of CURRENT.
(d) Mark CURRENT SOLVED if all of the nodes connected to it through the new marked
are have been labeled SOLVED.
(e) If CURRENT has been marked SOLVED or its h ' has just changed, its new status must
be propagate backwards up the graph. Hence all the ancestors of CURRENT are added
to S.
23
Heuristics used not to estimate the distance to the goal but to decide what node to expand
next.
Examples of this technique are design problem, labeling graphs, robot path planning and
crypt arithmetic puzzles (see last year).
Algorithm
Propagate available constraints:
Open all objects that must be assigned values in a complete solution.
Repeat until inconsistency or all objects assigned valid values:
Select an object and strengthen as much as possible the set of constraints that apply to
object.
If set of constraints different from previous set then open all objects that share any of these
constraints.
Remove selected object.
If union of constraints discovered above defines a solution return solution.
If union of constraints discovered above defines a contradiction return failure
Make a guess in order to proceed. Repeat until a solution is found or all possible solutions
exhausted:
Select an object with a no assigned value and try to strengthen its constraints.
Recursively invoke constraint satisfaction with the current set of constraints plus the selected
strengthening constraint.
24
2.1 Knowledge Representation
Knowledge representation refers to the data structure techniques and organizing notations
that are used in artificial intelligence (AI).
These include semantic networks, frames, logic, production rules, and conceptual graphs.
Knowledge acquisition encompasses a range of techniques that are used to obtain domain
knowledge about an application for the purpose of constructing an expert system.
2.1.1 Representation and mappings
In order to solve complex problems encountered in artificial intelligence, one needs both a
large amount of knowledge and some mechanism for manipulating that knowledge to create
solutions to new problems.
A variety of ways of representing knowledge (facts) have been exploited in AI programs.
Thus in solving problems in AI we must represent knowledge and there are two entities to
deal with:
Facts – truths in some relevant world. These are things we want to represent.
Representation of facts in some chosen formalism. These are the things we will actually be
able to manipulate.
We can structure these entities at two levels
The knowledge level - at which facts are described
The symbol level - at which representations of objects are defined in terms of symbols that
can be manipulated by programs
25
2.2.2 Simple relational knowledge
The simplest way of storing facts is to use a relational method where each fact about a set of
objects is set out systematically in columns. This representation gives little opportunity for
inference, but it can be used as the knowledge basis for inference engines.
Simple way to store facts.
Each fact about a set of objects is set out systematically in columns.
Little opportunity for inference.
Knowledge basis for inference engines.
Table 2.1: Simple Relational Knowledge
26
Otherwise go to that node and find a value for the attribute and then report it
Otherwise search through using is a until a value is found for the attribute.
2.2.4 Inferential Knowledge
Represent knowledge as formal logic:
All dogs have tails: :dog(x) hasatail(x)
Advantages
A set of strict rules.
o Can be used to derive more facts.
o Truths of new statements can be verified.
o Guaranteed correctness.
Many inference procedures available to in implement standard rules of logic.
Popular in AI systems. e.g Automated theorem proving.
2.2.5 Procedural Knowledge
Basic idea
Knowledge encoded in some procedures
o small programs that know how to do specific things, how to proceed.
o e.g a parser in a natural language understand has the knowledge that a noun phrase
may contain articles, adjectives and nouns. It is represented by calls to routines that
know how to process articles, adjectives and nouns.
Advantages
Heuristic or domain specific knowledge can be represented.
Extended logical inferences, such as default reasoning facilitated.
Side effects of actions may be modeled. Some rules may become false in time. Keeping track
of this in large systems may be tricky.
Disadvantages
Completeness - not all cases may be represented.
Consistency - not all deductions may be correct. e.g If we know that Fred is a bird we might
deduce that Fred can fly. Later we might discover that Fred is an emu.
Modularity is sacrificed. Changes in knowledge base might have far-reaching effects.
Cumbersome control information.
2.2.6 Issues in knowledge representation
Overall issues
Below are listed issues that should be raised when using a knowledge representation technique?
Are any attributes of objects so basic that they occur in almost every problem domain?
Are there any important relationships that exist among attributes of objects?
At what level should knowledge be represented? Is there a good set of primitives into which
all knowledge can be broken down?
How should sets of objects be represented?
Given a large amount of knowledge stored in a database, how can relevant parts are
accessed when they are needed?
We will see each of these questions briefly in the next five sections.
Important Attributes
Are there any attributes that occur in many different types of problem? There are two
instance and isa and each is important because each supports property inheritance.
Relationshipsamong Attributes
The attributes that we use to describe objects are themselves entities that we represent. What
properties do they have independent of the specific knowledge they encode? There are four such
properties that deserve are mentioned below.
Inverses.
Existence in an isa hierarchy.
Techniques for reasoning about values.
Single valued attributes.
27
Inverses
What about the relationship between the attributes of an object, such as, inverses, existence,
techniques for reasoning about values and single valued attributes. We can consider an example of
an inverse in,
band(John Zorn,Naked City)
This can be treated as John Zorn plays in the band Naked City or John Zorn's band is Naked
City.
Another representation is band = Naked City
Band-members = John Zorn, Bill Frissell, Fred Frith, Joey Barron,
Existence in an is a hierarchy
Just as there are classes of objects and specialized subsets of those classes, there are attributes
and specialization of attributes. Consider for example: the attribute height. In the case of attributes
they support inheriting information about such things as constraints on the values that the attribute
can have and mechanisms for computing those values.
Techniques for reasoning about values
Sometimes values of attributes are specified explicitly when a knowledge base is created.
Several kinds of information can play a role in this reasoning including:
Information about the type of value- for (eg): the value of height must be a number measure in
a unit of length.
Constraints on the value, often stated in terms of related entities- for (eg): the age of the
person cannot be greater than the age of either of that person’s parents.
Rules for computing the values when it is needed.
Rules that describe actions that should be taken if a value ever becomes known.
Single valued attributes
A specific but very useful kind of attribute is one that is guaranteed to take a unique value. For
example: a baseball player can, at any one time, have only a single height and be a member of only
one team.
Choosing the granularity of representation
At what level should the knowledge be represented and what are the primitives. Choosing the
Granularity of Representation Primitives are fundamental concepts such as holding, seeing, playing
and as English is a very rich language with over half a million words it is clear we will find difficulty in
deciding upon which words to choose as our primitives in a series of situations.
If Tom feeds a dog then it could become:
feeds(tom, dog)
If Tom gives the dog a bone like:
gives(tom, dog,bone) Are these the same?
In any sense does giving an object food constitute feeding?
If give(x, food) feed(x) then we are making progress.
But we need to add certain inferential rules. In the famous program on relationships Louise is
Bill's cousinHow do we represent this? louise = daughter (brother or sister (father or mother( bill)))
Suppose it is Chris then we do not know if it is Chris as a male or female and then son applies as well.
Clearly the separate levels of understanding require different levels of primitives and these need many
rules to link together apparently similar primitives. Obviously there is a potential storage problem and
the underlying question must be what level of comprehension is needed.
Representing set of objects
It is important to be able to represent sets of objects for several reasons. One is that there are
some properties that are true of sets that are not true of the individual members of a set.
Example
Consider the assertions that are being made in the sentences “There are more sheep than
people in Australia” and “English speakers can be found all over the world.” The only way to represent
the facts described in these sentences is to attach assertions to the sets representing people, sheep,
and English speakers, since, for example, no single English speaker can be found all over the world.
The other reason that it is important to be able to represent sets of objects is that if a property is true
of all elements of a set, then it is more efficient to associate objects is that if a property is true of all
elements of a set.
Finding the right structure as needed
28
In order to have access to the right structure for describing a particular situation, it is
necessary to solve all of the following problems.
How to perform an initial selection of the most appropriate structure.
How to fill in appropriate details from the current situation.
How to find a better structure if the one chosen initially turns out not to be appropriate.
What to do if none of the available structures is appropriate.
When to create and remember a new structure.
Selecting an initial structure
The selecting candidate knowledge structures to match a particular problem solving situation
is a hard problem, there are several ways in which it can be done. Three important approaches are
the following.
Index the structures directly by the significant English words that can be used to describe
them.
Consider each major concept as a pointer to all of the structures in which it might be involved.
Locate one major clue in the problem description and use it to select an initial structure.
Revising the choice when necessary
Once the candidate knowledge structure is detected, we must attempt to do a detailed match
of it to the problem at hand. Depending on the representation we are using the details of the matching
process will vary.
When the process runs into a snag, though, it is often not necessary to abandon the effort
and start over. Rather there are a variety of things that can be done. The following things can be
done:
Select the fragments of the current structure that do correspond to the situation and match
them against candidate alternatives.
Make an excuse for the current structure's failure and continue to use it.
Refer to specific stored links between structures to suggest new directions in which to
explore.
30
Inference
KB ├iα = sentence α can be derived from KB by procedure i.
Soundness: i is sound if whenever KB ├iα, it is also true that KB╞ α.
Completeness: i is complete if whenever KB╞ α, it is also true that KB ├iα.
Preview: we will define a logic (first-order logic) which is expressive enough to say almost
anything of interest, and for which there exists a sound and complete inference procedure.
That is, the procedure will answer any question whose answer follows from what is known by
the KB.
31
False False true False false true True true
If an expression is true, for all the rows (i.e. for all possible values of variables in that
expression) then it is called as tautology, and we write it as u.
Table 2.4: The following are some of laws and its equivalences.
S.No Equivalence Name of the Equivalence
1. ¬p(p ∧q)=¬p∨¬q Demorgans Law
2. p∧Tp=p Identity laws
3. p∧¬p=Cp Inverse laws
4. p∨Tp=Tp Domination Law
5. p∨p=p Idempotent laws
6. p →q=¬p∨q Implication laws
7. P∨q=q∨p Commutative laws
8. P∨(q∨r)=(p∨q) ∨r) Associative laws
9. P∨(q∧r)=(p∨r) ∧(p∨r) Distributive laws
Example1: show that (p∧q) →(p∨q) is a Tautology, i.e interpretation of this sentence is always true.
Solution: it can be proved that above logical expression is a tautology using rules of logical
equivalences.
(p∧q) →(p∨q)= ¬(p∧q) ∨(p∨q) (using implication laws)
= (¬p∨¬q) ∨(p∨q)
= (¬p∨p) ∨(¬q∨p) (by rearrangement of terms)
= Tp∨Tp (using inverse law)
=T
Example2: Show that (p∨q) ∧¬(¬p ∧q) and p are logically equivalence.
Solution: (p∨q) ∧¬(¬p ∧q)
=(p∨q) ∧(¬¬p ∨¬q)
=(p∨q) ∧(p∨¬q)
=p∨(q ∧¬q)
=p∨F
=p
Do the following problems for practice
Find out using truth table whether implication is tautology.
o (p∧r) →p
o (p∧q) →(p →q)
o ((p∨(¬(q∧r))) →((p ↔q) ∨r)
Show that tautology without using Truth table.
o (p∧(p →q)) →q
o (¬p∧(p∨q)) →q
Verify whether following are tautology.
o (¬p∧(p→q)) →¬q
o (¬q∧(p∨q)) →¬p
Show that pairs of expressions are logically equivalent.
o ¬p ↔q and p ↔¬q
o ¬(p∧q) and (¬p) ∨(¬q)
o ¬p→¬q and q→p
2.4.1 Inference rules
Inference rules are used to infer new knowledge in the form of propositions from the existing
knowledge. The knowledge propositions are logical implications of the existing knowledge.
a. Modus Ponens: This rule is also called rule of detachment. Symbolically it
is written as [p∧(p→q)] →q
Or
P
p→q
p ( it implies always p)
g. Rule of Conjunctive Simplification: This rule states that conjuction of p and
q logically implies p, i.e
33
(p∧q) →p
Or
p∧q
p ( it implies p)
h. Rule of Disjunctive Amplification: This rule states that p∨q can be
inferred from p, and p∨q is logical consequence of p, it is expressed
as
p→(p∧q)
this above can be expressed as,
p
p∨q
i. Rule of End elimination: This rule infers p from the wffp∧q, i.e
(p∧q
p
j. Rule of proof cases: it is stated in tabular form
p→r
q→r
(p∨q) →r
Example: Prove or disprove the following arguments:
“if the auditorium was not available or there were examinations, then the music programme was
postponed. If the music programme gets postponed, then a new date was announced. No new date
was announced. Therefore, auditorium was available.”
Solution: Let us assume that following are symbols for the statements (propositions) in the above
argument.
P= auditorium was available
Q= there were examination
R= music programme was postponed
S= new date was announced
The statements can be expressed in the form of logical expressions given as follows.
(¬p∨q) →r
r→s
¬s
And
Connectives can be used in the predicate similar to those in propositions.
Let us consider the sentences given below.
“Rama is a student and Rama plays cricket”.
“Rama is a student or Rama plays cricket”.
“Rama is a student implies that Rama plays cricket”.
“Rama is not student”.
These can be represented in the predicate forms in the same order as:
s(R)∧p(R,C)
s(R)∨p(R,C)
s(R) →p(R,C)
¬s(R)
In the above predicates, p(R,C) stands for “ Rama plays Cricket”, where p is predicate for
“plays”, R for “Rama” is a subject and C for Cricket” is an object. P(R,C) is a two place predicate.
Higher place predicates are also possible. Following are some of examples.
Rajan plays cricket and basketball = p(R,C,B).
Functions
The parameters a1,a2,…..,an in a predicate p, given below, can be constants or variables or
functions.
P(a1,a2,…..,an)
Consider the following sentences:
“Rajan is father of Rohit.”
“Sheela is mother of Rohit.”
“Rajan and Sheela are spouse.”
35
Let the expressions – fatherof(Rohit), and motherof(Rohit), be functions and their values are
“Rajan” and “Sheela” respectively. Using above expressions, the predicate.
Spouse(Rajan, Sheela),
Can be written as
spouse (fatherof(Rohit), motherof(Rohit)).
A function may have any number of objects, called arity of the functions. For example, if Rohit
and Rajini are brother-sisters, then the functions.
Father of Rajni and Rohit, and
Mother of Rajni and Rohit,
Can be written as,
fatherof (Rajni, Rohit)=Rajan
motherof (Rajni, Rohit)=Sheela.
Example
“2 plus 2 is 4.” Can be written as function formula as plus(2, 2)=4
“50 divided by 10 is 5.” Can be written as function formula as divided by(50, 10)=5
2.5.2 Representing simple facts in logic
We briefly mentioned how logic can be used to represent simple facts in the last lecture. Here
we will highlight major principles involved in knowledge representation. In particular predicate logic will
be met in other knowledge representation schemes and reasoning methods.
Symbols used the following standard logic symbols we use in this course are:
Let’s first explore the use of propositional logic as a way of representing the sort of world
knowledge that an AI system might need. Propositional logic is appealing because it is simple to deal
with and a decision procedure for it exists. Suppose we want to represent the obvious fact stated by
the classical sentence.
It is raining.
RAINING
It is sunny
SUNNY
It is raining, then it is not sunny.
RAINING SUNNY
Let’s now explore the use of predicate logic as a way of representing knowledge by looking at
a specific example. Consider the following set of sentences.
Marcus was a man
Marcus was a pompeian.
All Pompeians were romans
Caesar was a ruler.
All romans were either loyal to caesar or hated him.
The facts described by these sentences can be represented as a set of wff’s in predicate logic as
follows:
1. Marcus was a man.
Man(Marcus)
This representation captures the critical fact of marcus being a man. It fails to capture some of
the information in the english sentence, namely the notion of past tense.
2. Marcus was a Pompeian
Pompeian(marcus)
3. All Pompeians were romans.
x: pompeians(x) Roman(x)
4. Caesar was a ruler.
ruler(Casear)
Here we ignore the fact that proper names are often not references to unique individuals,
since many people share the same name. Sometimes deciding which of several people of the same
name being referred to in a particular statement may require a fair amount of knowledge and
reasoning.
5. All romans were either loyal to caesar or hated him.
x:Roman(x) loyalto(x, Caesar) V hate(x, Caesar)
36
In English the word “or” sometimes means the logical inclusive or and sometimes means the
logical exclusive or (XOR). Here we have used the inclusive interpretation. Some people argue
however that this English sentence is really stating an, exclusive or. To express that, we would have
to write.
x: roman(x) [(loyal to(x, Caesar) V hate(x, Caesar)) (loyalto(x,
Caesar) hate(x, Caesar))]
2.5.3 Variable and Quantifiers
To generalize the statement, “rama is student”, it is written as “x is student”, i.e s(X). if s(X) is
true for a single case, then we say that the expression is satisfied.
Let us consider the following statements.
“x is human implies x is mortal.”
“Socrates is human.”
When represented in predicate form, these become:
h(x) →m(x), and
h(S)
The above two wffs have some resemblance to the premises required for the inference rule of
modus ponens. To generalize the implication, the variable x applicable for the entire human domain
is quantified using quantying operator called universal quantifier. Above statements can be
modified as follows after incorporating the effect of quantifiers.
“for all x, x is human implies that x is mortal”, and
“Socrates is human”
Now, these are rewritten in symbolic form using quantifying operator
x (h(x) →m(x)), and
h(S)
in this case the first statement x (h(x) →m(x)) is true, when the statement is found to be
true for entire range of x. because it says “for all x” or “for every x” or “for all possible values of x”, h(x)
→m(x) is true. Still, it is not possible to infer m(S), i.e “mortal Socrates”. Because the statements still
do not appear in the form such that the rule of modus ponens can be applied. The inference rule of
universal instantiation, discussed in the next section, will help in resolving this problem.
2.5.4 Quantifiers
Quantifiers are used to express properties of entire collection of objects, rather than represent
the object by names. FOL contains two standard quantifiers,
# Universal quantifier ( )
# Existential quantifier ( )
Universal quantifiers
General Notation: “ ∀ X P” where,
P – Logical expression, X – Variable, - For all
That is, P is true for all objects X in the Universe.
Examples
All cats are mammals => X cat (X) mammals(X)
That is, all the cats in the universe belongs to the type of mammals and hence the variable X
may be replaced by any of the cat name (object, Name)
Examples
Spot is a cat
Spot is a mammal
Cat (spot)
Mammal (spot)
Cat (spot) mammal (spot)
Spot – Name of the cat.
Existential Quantifiers
General Notification: X P, where
P – Logical Expression , X – Variable , - There exist
That is P is true for some object X in the universe.
37
Example
Spot has a sister who is a cat.
X sister ( X , spot ¿ cat(X)
That is, the spot’s sister is a cat, implies spot is also a cat and hence X may be replaced by,
sister of spot, if it exists.
Example
Felix is a cat.
Felix is a sister of spot
Cat (Felix)
Sister (Felix, spot)
Sister (Felix, spot) cat (Felix).
Nested quantifiers
The sentences are represented using multiple quantifiers.
Example
# For all X and all Y, if x is the parent of Y then Y is the child of X.
X, Y parent (X, Y) child (Y, X).
# Everybody loves somebody
∀ X ∃ Y loves (X, Y)
# There is someone who is loved by everyone
Y X loves (X, Y)
Connection between and :
The two Quantifiers ( and ) are actually connected with each other through
negation.
Example
Everyone likes ice cream
X likes (X, ice cream) is equivalent to 7∃X 7 likes (X, ice cream)
That is there is no one who does not like ice cream.
Ground Term or Clause: A term with no variable is called ground term.
Eg: cat (spot)
The De Morgan rules for quantified and unquantified sentences are as follows,
# quantified sentences:
X7P ≡ 7 XP
7 XP ≡ X7P
XP ≡ 7∃X 7P
XP ≡ 7 ∀ X 7P
#unquantified sentence:
7(P∧Q) ≡ 7P ∨ 7Q
7P ∧ 7Q ≡ 7(P∨Q)
P ∧ Q ≡ 7(7P ∨ 7Q)
P∨ Q ≡ 7(7P ∧ 7Q)
2.5.5 Inference rules
All the inference rules applicable for propositional logic also apply for predicate logic.
However, due to introduction of quantifiers, additional inference rules are there for the expressions
using quantifiers. These are given below.
1. Rule of universal instantiation
This rules states that if a universally quantified variable in a valid sentence is replaced
by a term from the domain, then also the sentence is true. Thus is, if
x (h(x) →m(x))
Is true, and if x is replaced by “Socrates”. And quantifier is removed then the
statement,
(h(S) →m(S))
Is still true. This rule is called as rule of universal instantiation and expressed as
x P(x)
38
P(a) (for all x p(x), can be written as P(a) )
2. Rule of universal generalization
If a statement p(a) is true for each element a of universe, then the universal quanitifier
may be prefixed, and x P(x) can be inferred from p(a), i.e.
P(a), for all a Є U
x P(x) ( (p(a) can be written as for all x, p(x))
3. Rule of Existential Instantiation
If x p(x) is true, and there is an element a in the universe of p, then we can infer
p(a). i.e
` x p(x)
z (k(z) →f(z))
We can arrive to formal proof for the above using following steps.
Steps Justification
1. x (k(x) →m(x)) Given the premise
2. k(a) →m(a) By rule of universal instantiation on (1)
3. y (m(y) →f(y)) Given the premise
4. m(a) →f(a) By rule of universal instantiation on (3)
5. k(a) →f(a) By rule of syllogism using (2) and (4)
6. z (k(z) →f(z)) By rule of universal generalization on (5)
Hence its proved
2.6 Unification
Basic idea
The require findings substitutions that make different logical expression. Thisprovers is called
unification and is a key component of all first order inference algorithms. The UNIFY algorithm takes
two sentences and returns a unifier for them if one exists:
Syntax: Unify (P,Q) = q where SUBSET (q,P)= SUBSET (q, q)
Here are the results of unification with four different sentences that might be in knowledge
base.
UNIFY ( knows (john,x), knows (john,jane)) = {x|jane}…………………………(1)
UNIFY ( knows (john,x), knows (y,bill)) = {x|bil,y|john}……………………….(2)
UNIFY ( knows (john,x), knows (y,mother(y))) = {y|john,x|mother(john)}…(3)
UNIFY ( knows (john,x), knows (x,Elizabeth)) = fail…………………………….(4)
39
The last unification fails because x cannot take on the values john and Elizabeth at the
sametime. Now remember that knows (x,Elizabeth) means “everyone knows Elizabeth”,
we should be able to infer that john knows Elizabeth. The problem arises only because the
two sentences happen to use the same variable name, x. the problem can be avoided by
standardizing apart one of the two sentences being unified, which means remaining its variable to
avoid name clashes.
p q θ
Knows(John,x) Knows(John,Jane) {x/Jane}}
Knows(John,x) Knows(y,OJ) {x/OJ,y/John}}
Knows(John,x) Knows(y,Mother(y)) {y/John,x/Mother(John)}}
Knows(John,x) Knows(x,OJ) {fail}
40
The matching rules are simple. Different constants or predicates cannot match; identical ones
can. A variable can match another variable, any constant, or a predicate expression, with the
restriction that the predicate expression must not contain any instances of the variable being matched.
Example
P(x,x)……………………(1)
P(y,z)…………………….(2)
The two instances of P match fine. Next we compare x and y, and decide that if we substitute
y for x, they could match. We will write that substitution as x=y in (1)
y/.x
(We could, of course, have decided instead to substitute x for y, since they are both just
dummy variable names. The algorithm will simply pick one of these two substitutions). But now, if we
simply continue and match x and z, we produce the substitution z/x. but we cannot substitute both y
and z for x, so we have not produced a consistent substitution.
What we can need to do after finding the first substitution y/x is to make that substitution y/x is
to make that substitution throughout the literals, giving
P(y,y)
P(y,z)
Now we can attempt to unify arguments y and z, which succeeds with the substitution z/y.
The entire unification process has now succeeded with a substitution that is the composition of the
two substitutions we found. We write the composition as Y=z and x=y unifications pass.
(z/y) (y/x)
Following standard notation for function compostion. In general the substitution (a1/a2,
a3/a4,..) (b1/b2,b3/b4…)…means to apply all the substitutions of the right most list, then take the
result and apply all the ones of the ones of the next list, and so forth, until all substitutions have been
applied.
The object of unification procedure is to discover at least one substitution that causes two
literals to match.
For example: the literals
hate(x,y)
hate(Marcus, z)
could be unified with any of the following substitutions:
(marcus/x,z/y)
(marcus/x,y/z)
(marcus/x,Caesar/y,Caesar/z)
(marcus/x,polonius/y,polonius/z)
The first two of these are equivalent except for lexical variation. But the second two, although
they produce a match, also produce a substitution that is more restrictive than absolutely necessary
for the match.
Algorithm
If L1 or L2 are both variables or constants, then:
o If L1 and L2 are identical, then return NIL.
o Else if L1 is a variable, then if L1 occurs in L2 then return {FAIL}, else return (L2/L1).
o Else if L2 is a variable then if L2 occurs in L1 then return {FAIL}, else return (L1/L2).
o Else return (FAIL}.
If the initial predicate symbols in L1 and L2 are not identical, then return {FAIL}
If L1 and L2 have a different number of arguments, then return {FAIL}.
Set SUBST to NIL. (At the end of this procedure, SUBST will contain all the substitutions
used to unify L1 and L20
For I1 to number of arguments inL1:
o Call unify with the /th argument of L1 and the ith argument of L2, putting result in S.
o If S contains FAIL then return {FAIL}.
o If S is not equal to NIL then;
1. Apply S to the remainder of both L1 and L2.
2. SUBST := APPEND(S, SUBST)
Return SUBST.
41
2. 7 Weak Slot and Filler Structures
2.7.1 Introduction
It enables attribute values to be retrieved quickly
o assertions are indexed by the entities
o binary predicates are indexed by first argument. E.g. team(Mike-Hall , Cardiff).
Properties of relations are easy to describe .
It allows ease of consideration as it embraces aspects of object oriented programming.
So called because:
o A slot is an attribute value pair in its simplest form.
o A filler is a value that a slot can take -- could be a numeric, string (or any data type) value
or a pointer to another slot.
o A weak slot and filler structure does not consider the content of the representation.
We will study two types
Semantic Nets.
Frames.
The major idea is that
The meaning of a concept comes from its relationship to other concepts, and that,
The information is stored by interconnecting nodes with labeled arcs.
Intersection search
One of the early ways that semantic nets were used was to find relationships among objects
by spreading activation out from each of two nodes and seeing where the activation met. This
process is called intersection search.
Representing Non binary predicates
Semantic nets are a natural way to represent relationships that would appear as ground
instances of binary predicates in predicate logic. For example some of the arcs from the
below figure, could be represented as logic as
42
Fig 2.9: A Semantic Network for n-Place Predicate.
As a more complex example consider the sentence: John gave Mary the book. Here we have
several aspects of an event.
43
In making certain inferences we will also need to distinguish between the link that defines a new
entity and holds its value and the other kind of link that relates two existing entities. Consider the
example shown where the height of two people is depicted and we also wish to compare them.
We need extra nodes for the concept as well as its value.
44
Fig. 2.14: Partitioned network.
In particular, it becomes useful to assign more structure to nodes as well as to links.
2.7.3 Frames
A frame is a collection of attribute (usually called slots) and associated values (and possibly
constraints on values) that describes some entity in the world. Sometimes a frame describes an entity
in some absolute sense; sometimes it represents the entity from a particular point of view (as it did in
the vision system proposal in which the term frame was first introduced).
A single frame taken alone is rarely useful, instead we build frame systems out collections of
frames that are connected to each other by virtue of the fact that the value of an explore ways that
frame systems can be used to encode knowledge and support reasoning.
Frames as Sets and Instances
Frames can also be regarded as an extension to Semantic nets. Indeed it is not clear where
the distinction between a semantic net and a frame ends. Semantic nets initially we used to represent
labeled connections between objects. As tasks became more complex the representation needs to be
more structured. The more structured the system it becomes more beneficial to use frames. A frame
is a collection of attributes or slots and associated values that describe some real world entity. Frames
on their own are not particularly helpful but frame systems are a powerful way of encoding information
to support reasoning. Set theory provides a good basis for understanding frame systems. Each frame
represents.
a class (set), or
an instance (an element of a class).
Consider the example first discussed in Semantics Nets
Person
Isa : Mammal
Cardinality :
Adult-Male
Isa : Person
Cardinality :
Rugby-Player
Isa : Adult-Male
Cardinality :
Height :
Weight :
Position :
Team :
Team-Colours :
Back
Isa : Rugby-Player
Cardinality :
Tries :
Mike-Hall
Instance : Back
45
Height : 6-0
Position : Centre
Team : Cardiff-RFC
Team-Colours : Black/Blue
Rugby-Team
Isa : Team
Cardinality :
Team-size : 15
Coach :
46
Rules for computing values
Many values for a slot.
A slot is a relation that maps from its domain of classes to its range of values.
NOTE the following:
Instances of SLOT are slots
Associated with SLOT are attributes that each instance will inherit.
Each slot has a domain and range.
Range is split into two parts one the class of the elements and the other is a constraint which
is a logical expression if absent it is taken to be true.
If there is a value for default then it must be passed on unless an instance has its own value.
The to-compute attribute involves a procedure to compute its value. E.g. in Position where we
use the dot notation to assign values to the slot of a frame.
Transfers through lists other slots from which values can be derived from inheritance.
47
o See if there is any other element of CANDIDATES that was derived from a class closer to
F than the class from which C came.
o If there is, then, remove C from CANDIDATES.
check the cardinality of CANDIDATES:
o if it is 0, then report that no value was found.
o If it is 1, then return the single element of CANDIDATES as V.
o If it is greater than 1, report a contradiction.
Frame Languages
The idea of a frame system as a way to represent declarative knowledge has been encapsulated
in a series of frame oriented knowledge representation languages, whose features have evolved and
been driven by an increased understating of the sort of representation issues.
Example: KRL, FRL, RLL, KL-ONE, Brachman and Schmolze, KRYPTON,NIKL.
2.8 Strong Slot and Filler Structures
2.8.1 Introduction
Strong Slot and Filler Structurestypically:
Represent links between objects according to more rigid rules.
Specific notions of what types of object and relations between them are provided.
Represent knowledge about common situations.
Conceptual Dependency (CD)
Conceptual Dependency originally developed to represent knowledge acquired from natural
language input. The goals of this theory are:
To help in the drawing of inference from sentences.
To be independent of the words used in the original input.
That is to say: For any 2 (or more) sentences that are identical in meaning there should be
only one representation of that meaning.
It has been used by many programs that portend to understand English (MARGIE, SAM,
PAM). CD developed by Schanket al. as were the previous examples.
CD provides
a structure into which nodes representing information can be placed
a specific set of primitives
at a given level of granularity.
Sentences are represented as a series of diagrams depicting actions using both abstract and
real physical situations.
The agent and the objects are represented
The actions are built up from a set of primitive acts which can be modified by tense.
Examples of Primitive Acts are:
ATRANS
Transfer of an abstract relationship. e.g. give.
PTRANS
Transfer of the physical location of an object. e.g. go.
PROPEL
Application of a physical force to an object.e.g. push.
MTRANS
Transfer of mental information. e.g. tell.
MBUILD
Construct new information from old. e.g. decide.
SPEAK
Utter a sound. e.g. say.
ATTEND
Focus a sense on a stimulus. e.g. listen, watch.
MOVE
Movement of a body part by owner.e.g. punch, kick.
GRASP
48
Actor grasping an object.e.g. clutch.
INGEST
Actor ingesting an object.e.g. eat.
EXPEL
Actor getting rid of an object from body.e.g. ????.
Six primitive conceptual categories provide building blocks which are the set of allowable
dependencies in the concepts in a sentence:
Advantages of CD
Using these primitives involves fewer inference rules.
Many inference rules are already represented in CD structure.
The holes in the initial structure help to focus on the points still to be established.
Disadvantages of CD
Knowledge must be decomposed into fairly low level primitives.
Impossible or difficult to find correct set of primitives.
A lot of inference may still be required.
Representations can be complex even for relatively simple actions. Consider:
Dave bet Frank five pounds that Wales would win the Rugby World Cup.
Complex representations require a lot of storage
Applications of CD
MARGIE
(Meaning Analysis, Response Generation and Inference on English) -- model natural
language understanding.
SAM
(Script Applier Mechanism) -- Scripts to understand stories. See next section.
PAM
(Plan Applier Mechanism) -- Scripts to understand stories.
Schanket al. developed all of the above.
Scripts
A script is a structure that prescribes a set of circumstances which could be expected to follow on
from one another. It is similar to a thought sequence or a chain of situations which could be
anticipated.
It could be considered to consist of a number of slots or frames but with more specialized roles.
Scripts are beneficial because
Events tend to occur in known runs or patterns.
Causal relationships between events exist.
Entry conditions exist which allow an event to take place
Prerequisites exist upon events taking place. E.g. when a student progresses through a
degree scheme or when a purchaser buys a house.
The components of a script include:
Entry Conditions
These must be satisfied before events in the script can occur.
Results
Conditions that will be true after events in script occur.
Props
Slots representing objects involved in events.
Roles
Persons involved in the events.
Track
Variations on the script. Different tracks may share components of the same script.
Scenes
The sequence of events that occur.Events are represented in conceptual dependency form.
Scripts are useful in describing certain situations such as robbing a bank. This might involve:
Getting a gun.
Hold up a bank.
Escape with the money.
Here the Props might be
49
Gun, G.
Loot, L.
Bag, B
Getaway car, C.
The Roles might be:
Robber, S.
Cashier, M.
Bank Manager, O.
Policeman, P.
The Entry Conditions might be:
S is poor.
S is destitute.
The Results might be:
S has more money.
O is angry.
M is in a state of shock.
P is shot.
There are 3 scenes: obtaining the gun, robbing the bank and the getaway.
Some additional points to note on Scripts:
If a particular script is to be applied it must be activated and the activating depends on its
significance.
If a topic is mentioned in passing then a pointer to that script could be held.
If the topic is important then the script should be opened.
The danger lies in having too many active scripts much as one might have too many
windows open on the screen or too many recursive calls in a program.
Provided events follow a known trail we can use scripts to represent the actions involved
and use them to answer detailed questions.
Different trails may be allowed for different outcomes of Scripts ( e.g. The bank robbery
goes wrong).
The full Script could be described in the following figure,
Advantages of Scripts
Ability to predict events.
A single coherent interpretation may be build up from a collection of observations.
Disadvantages
Less general than frames.
May not be suitable to represent all kinds of knowledge.
2.8.2 CYC
What is CYC?
An ambitious attempt to form a very large knowledge base aimed at capturing
commonsense reasoning.
Initial goals to capture knowledge from a hundred randomly selected articles in the
Encyclopedia Britannica.
Both Implicit and Explicit knowledge encoded.
Emphasis on study of underlying information (assumed by the authors but not needed to
tell to the readers.
Example: Suppose we read that Wellington learned of Napoleon's death
Then we (humans) can conclude Napoleon never new that Wellington had died.
How do we do this?
We require special implicit knowledge or commonsense such as:
We only die once.
You stay dead.
You cannot learn of anything when dead.
Time cannot go backwards.
Why build large knowledge bases:
50
Fig. 2.15: Simplified Bank Robbing Script.
Brittleness
Specialized knowledge bases are brittle. Hard to encode new situations and non-graceful
degradation in performance. Commonsense based knowledge bases should have a firmer foundation.
Form and Content
Knowledge representation may not be suitable for AI. Commonsense strategies could point out
where difficulties in content may affect the form.
Shared Knowledge
Should allow greater communication among systems with common bases and assumptions.
How is CYC coded?
Special CYCL language:
o LISP like.
o Frame based
o Multiple inheritance
o Slots are fully fledged objects. Generalized inheritance -- any link not just isa and
instance.
51
Chapter 3: Reasoning under Uncertainty
52
Issues: Problems with combination, e.g., Sprinkler causes Rain??
54
Consistent labeling.
Contradiction.
Applying rules to derive conclusions.
Creating justifications for the results of applying rules.
Choosing among alternative ways of resolving a contradiction.
Detecting contradictions.
Logic-Based Truth Maintenance Systems (LTMS)
Similar to JTMS except:
Nodes (assertions) assume no relationships among them except ones explicitly stated in
justifications.
JTMS can represent P and P simultaneously. An LTMS would throw a contradiction here.
If this happens network has to be reconstructed.
∑ P ( Hn ) . P (Hn)
k
E
n =1
Example
We are interested in examining the geological evidence at a particular location to determine
whether that would be a good place to dig to find a desired mineral.
If we know the probability of each minerals in prior and its physical characteristics then we
can use baye’s formula to compute.
From based upon the evidence we collect how likely it is various minerals are present in
particular place can be identified.
Key to use Baye’s Theorem
P (A\B) = conditional probability of A given we have only evidence of B.
Example: Solving Medical Diagnosis problem
S: Patient has Spots
M: Patient has Measles
F: Patient has High Fever
o Presence of spots serves as evidence in favor of Measles. It also serves evidence for
fever, measles cause fever.
o Either spot or fever alone causes evidence in favor of measles.
o If both are present we need to take both into account in determining total weight of
evidence.
P(H\E,e)= P(H|E) P(e|E,H)
P (e|E)
Bayes' Rule
We can rearrange the two parts of the product rule:
55
We can think about some events as being “hidden” causes: not necessarily directly observed
(e.g. a cavity).
If we model how likely observable effects are given hidden causes (how likely toothache is
given a cavity)
Then Bayes’ rule allows us to use that model to infer the likelihood of the hidden cause (and
thus answer our question)
In fact good models of are often available to us in real domains (e.g. medical
diagnosis)
Suppose a doctor knows that a meningitis causes a stiff neck in 50% of cases
P( s|m)=0. 5
She also knows that the probability in the general population of someone having a
stiff neck at any time is 1/20
P( s)=0 . 05
She also has to know the incidence of meningitis in the population (1/50,000)
P(m)=0 .00002
Using Bayes’ rule she can calculate the probability the patient has meningitis:
P( s|m) P(m) 0. 5×0 .00002
P(m|s )= = =0 . 0002=1 /5000
P( s ) 0 . 05
P(effect|cause) P(cause )
P(cause|effect )=
P(effect )
3.5 certainty factor and rule based system
It is one of practical way of compromising pure Bayesian system.
The approach we discuss was pioneered in the MYCIN system, which attempt to recommend
appropriate therapies for patient with bacterial infections.
It interacts with the physician to acquire the clinical data it needs.
MYCIN is an example of an expert system since it normally done by a expert system.
MYCIN uses rules to reason clinical data from its goal of finding significant disease causing
organisms.
Once it finds the organisms, it then attempt to select a therapy by which the disease (s) may
be treated.
Certainty factor
A Certainty factor (CF [h,e]) is defined in terms of 2 components
MB [h, e] --- a Measure of Belief (between 0 and 1) of belief in hypothesis h given the
evidence e. MB measures the extent to which the evidence supports the hypothesis. It is zero
if the evidence fails to support the hypothesis.
MD [h,e] --- a Measure of Disbelief (between 0 and 1) of Disbelief in hypothesis h given the
evidence e. MD measures the extent to which the evidence supports the negation of the
hypothesis. It is zero if the evidence supports the hypothesis.
From these two measures we can define the certainty factor
CF[h,e]= MB[h,e]-MD[h,e].......... Equationn no1
Combining Uncertainty rules
MYCIN reflect the experts’ assessments of the strength of evidence in support of hypothesis.
MYCIN reasons however there CF’s need to be combined to reflect the operations of multiple
pieces of evidence and multiple rules are applied to the problem.
The measure of belief and disbelief is given by S1 and S2
56
MB(h, S1∧S2) = { 0 if MD[h, S1∧S2] = 1
{MB(h,S1) + MB(h, S2) [1-MB(h,S1) …………Equation no 2 otherwise
MD (h, S1∧S2) = { 0 if MB[h, S1∧S2] = 1
{MD (h, S1) + MD (h, S2) [1-MD (h, S1)……….. Equation no 3 otherwise
Suppose we make an initial observation that confirms our belief in h with MB(h, S1)= 0.3
MD [h, s1]=0, MB[h,s2]=0.2 and CF[h,s1]=0.3
Substituting these in equation no 2 weget
MB (h, S1∧S2) = {0.3 +0.2 (1-0.3)
= 0.3 + 0.2 (0.7)
= 0.44……………………………Equation no 4
MD (h, S1∧ S2)= 0.0 …………………………Equation no 5
Substituting eq 4 and eq 5 in eq 1 weget
CF[h,e]= MB[h,e]-MD[h,e]
= 0.44 -0
Certainty Factor (h, e) = 0.44…………………………equation no 6
The formula MYCIN uses for the MB of the conjunction and disjunction of 2 hypothesis.
By using Bayes theorem
MB[h,e]= {max [p(h/e), p(h)]- p(h)
P(h/e) 1-p(h)
From eq 4, 5 and 6 can be written as single rule rather that 3 separate rules.MYCIN
independence assumption can make moment 3 separate rules CF each was 0.6
MB [h, S1∧S2] = 0.6+ 0.6 (1-0.6)
= 0.6 + 0.6(0.4)
= 0.84
MB [h, (S1∧S2)∧S3] = 0.84+ 0.6 (1-0.84)
= 0.84 + 0.6(0.16)
= 0.936
Let us consider a concrete example
S: Sprinkler was on last night
W: grass is wet
R: it rained last night
We can write MYCIN rules that describe predictive relation among 3 events
If sprinkle was on last night then grass is wet in morning = evidence is 09%
If the grass is wet in morning then it is rained last night = 0.9-0.1=0.8%
MB(W,S) =0.8
MD(R, W) = 0.8* 0.9
= 0.72
57
(e.g., a specific disease, a body temperature, or a reading taken by some other diagnostic
device).
In figure 3.2 (b), we show a causality graph for the wet grass example. In addition to the three
nodes we have been talking about, the graph contains new nodes we have been talking
about; the graph contains a new node corresponding to the propositional variable that tells us
whether it is currently the rainy season.
(a) (b)
Fig 3.2: Representing Causality Uniformly
A DAG such as the one we have just drawn illustrates the causality relationships that occur
among the nodes it contains.
In order to use it as a basis for probabilistic reasoning, however, we need more information.
We can state this inn a table in which the conditional probabilities are provided. We sow such
a table for our example in figure 3.3.
For example from the table we see that the prior probability of the rainy season is 0.5. “Then,
if it is the rainy season, the probability of rain on a given night is 0.9; if it is not, the probability
is only 0.1.
Table 3.1: Conditional Probabilities for a Bayesian Network
Attribute Probability
P(wet \ Sprinkler, Rain) 0.95
P(wet \ Sprinkler, ̚Rain) 0.9
P(wet \ ̚Sprinkler,Rain) 0.8
P(wet \ ̚Sprinkler, ̚Rain) 0.1
P(Sprinkler\ RainySeason) 0.0
P(Sprinkler\ ̚RainySeason) 1.0
P(Rain \ RainySeason) 0.9
P(Rain \ ̚RainySeason) 0.1
P(RainySeason) 0.5
To be useful for a basis of problem solving, we need a mechanism for computing the
influence of any arbitrary node on any other.
For example, suppose that we have observed that it rained last night. What does that tell us
about the probability that it is the rainy season?
To answer this question requires that the initial DAG be converted to an undirected graph in
which the arcs can be used to transmit probabilities in either direction, depending on where
the evidence is coming from.
We also require a mechanism for using the graph that guarantees that probabilities are
transmitted correctly.
For example, while it is true that observing wet grass may be evidence for rain, and
observing rain is evidence for wet grass, we must guarantee that no cycle is ever traversed in
such a way that wet grass is evidence for rain, which is then taken as evidence for wet grass.
There are three broad classes of algorithms for doing these computations: a message-
passing method, a clique triangular method and a variety of stochastic algorithms.
The idea behind these methods is to take advantage of the fact that nodes have limited
domains of influence.
Thus although in principle the task of updating probabilities consistently throughout the
network is intractable, in practice it may not be.
58
In the clique triangulation method, for example, explicit arcs are introduced between pairs of
nodes that share a common- descendent.
For the case shown in figure 3.2 (b), a link would be introduced between Sprinkler and Rain.
This is important since wet grass could be evidence of either of them, but wet grass plus one
of its causes is not evidence for the competing cause since an alternative explanation for the
observed phenomenon already exists.
The message-passing approach is based on the observation that to compute the probability
of a node given what is known about others nodes in the network, it is necessary to know
three things:
o ᴫ- the total support arriving at A from its parents nodes (which represent its causes).
o Λ- the total support arriving at A from its children (which represents its symptoms).
o The Entry in the fixed conditional probability matrix that relates A to its causes.
59
Plausibility (denoted by Pl) is defined to be
Pl(s)=1-Bel(~s). ……………………..Eq 1
It also ranges from 0 to 1 and measures the extent to which evidence in favor of ~s leaves
room for belief in s.
For example
Consider a simplified Diagnosis problem to cause FEVER
All: allergy
Flu: flu
Cold: cold
Pneu: pneumonia
Θ might consist of the set {All, Flu, Cold, Pneu}. Each contains 0.2 % evidence to cause
fever. (i.e) All = 0.2, Flu= 0.2, Cold= 0.2, Pneu=0.2 Our goal is to attach some measure of belief to
elements of Θ.
Let us see how m works for our diagnosis problem. Assume that we have no information
about how to choose among four hypotheses when we start the diagnosis task. Then we define m as:
{ Θ} (1.0) (i.e 100 percent evidence to cause fever)………Eq 2
Fever might be such a piece of evidence. We update m as follows:
{ Flu, Cold, Pneu} (0.2+0.2+0.2=0.6) …………………Eq 3
Where Flu, Cold, Pneu serves 60 % evidence to cause fever and remaining 40% is assigned
to value Θ.
{ Θ} (0.4) …………………………………….Eq 4
At this point we assigned to the set {flu, cold, Pneu} the appropriate belief. The remainder of
the belief still resides in the larger set Θ. Thus Bel(p) is our overall belief that the correct answer lies
somewhere in the set p.
We are given two Belief functions m1 and m2. suppose m1 corresponds to our belief after
observing fever. From equation 3 and 4.
{ Flu, Cold, Pneu} (0.6)
{ Θ} (0.4)
Suppose m2 corresponds to our belief after observing allergy in addition weget
Allergy =0.2 therefore (0.6 + 0.2 =0.8) {All, Flu, Cold} (0.8)……Eq 5
The we can compute the combinationm3 using the following table (in which we further
abbreviations disease names
M1 (x) * M2(x)…………………… eq no 6
M2(x)
{A,FC} (0.8) Θ (0.2)
M1 (x)
{F,C,P} (0.6) {F,C} (0.48) {F,C,P} (0.12)
Θ (0.4) {A,F,C} (0.32) Θ (0.08)
As a result of applying m1 and m2, we produced a new piece of evidence over fever.
{Flu, Cold} (0.48)……………………………eq no 7
{All, Flu, Cold} (0.32)…………………………….eq no 8
{Flu, Cold, Pneu} (0.12)…………………..eq no 9
Θ (0.08)………………………………eq no 10
Now let m3 corresponds to our belief given just the evidence for allergy
From eq no 5 allergy= 0.8+ 0.1 =0.9 remaining 0.1% serves evidence for Θ.
{All} (0.9)………………………….eq no 11
Θ (0.1)…………………………..eq no 12
We can apply the numerator of the combination rule to produce (where * denotes the empty
set)
M2(x)
{A } (0.9) Θ (0.1)
M1 (x)
60
{F,C} (0.48) Φ (0.432) {F,C} (0.048)
{A,F,C} (0.32) {A,F,C} (0.288) {A,F,C} (0.032)
4.1 Planning in AI
61
4.1.1 What is planning?
Generate sequences of actions to perform tasks and achieve objectives.
o States, actions and goals
Search for solution over abstract space of plans.
Classical planning environment: fully observable, deterministic, finite, static and discrete.
Assists humans in practical applications
o design and manufacturing
o military operations
o games
o space exploration
Difficulty of real world problems
Assume a problem-solving agent
using some search method …
o Which actions are relevant?
Exhaustive search vs. backward search
o What is a good heuristic functions?
Good estimate of the cost of the state?
Problem-dependent vs, -independent
o How to decompose the problem?
Most real-world problems are nearly decomposable.
Planning language
What is a good language?
o Expressive enough to describe a wide variety of problems.
o Restrictive enough to allow efficient algorithms to operate on it.
o Planning algorithm should be able to take advantage of the logical structure of the
problem.
STRIPS and ADL
General language features
Representation of states
o Decompose the world in logical conditions and represent a state as a conjunction of
positive literals.
Propositional literals: Poor Ù Unknown
FO-literals (grounded and function-free): At(Plane1, Melbourne) Ù At(Plane2,
Sydney)
o Closed world assumption
Representation of goals
o Partially specified state and represented as a conjunction of positive ground literals
o A goal is satisfied if the state contains all literals in goal.
Representations of actions
o Action = PRECOND + EFFECT
Action(Fly(p,from, to),
PRECOND: At(p,from) Ù Plane(p) Ù Airport(from) Ù Airport(to)
EFFECT: ¬AT(p,from) Ù At(p,to))
= action schema (p, from, to need to be instantiated)
Action name and parameter list
Precondition (conj. of function-free literals)
Effect (conj of function-free literals and P is True and not P is false)
o Add-list vs delete-list in Effect
4.1.2 Language semantics?
How do actions affect states?
o An action is applicable in any state that satisfies the precondition.
o For FO action schema applicability involves a substitution q for the variables in the
PRECOND.
At(P1,JFK) Ù At(P2,SFO) Ù Plane(P1) Ù Plane(P2) Ù Airport(JFK) Ù Airport(SFO)
62
Satisfies : At(p,from) Ù Plane(p) Ù Airport(from) Ù Airport(to)
Withq ={p/P1,from/JFK,to/SFO}
Thus the action is applicable.
The result of executing action a in state s is the state s’
o s’ is same as s except
Any positive literal P in the effect of a is added to s’
Any negative literal ¬P is removed from s’
EFFECT: ¬AT(p,from) Ù At(p,to):
At(P1,SFO) Ù At(P2,SFO) Ù Plane(P1) Ù Plane(P2) Ù Airport(JFK) Ù Airport(SFO)
o STRIPS assumption: (avoids representational frame problem)
every literal NOT in the effect remains unchanged
Expressiveness and extensions
STRIPS is simplified
o Important limit: function-free literals
Allows for propositional representation
Function symbols lead to infinitely many states and actions
Recent extension:Action Description language (ADL)
Action(Fly(p:Plane, from: Airport, to: Airport),
PRECOND: At(p,from) Ù (from ¹ to)
EFFECT: ¬At(p,from) Ù At(p,to))
Standardization : Planning domain definition language (PDDL)
Example: air cargo transport
Init(At(C1, SFO) Ù At(C2,JFK) Ù At(P1,SFO) Ù At(P2,JFK) Ù Cargo(C1) Ù Cargo(C2) Ù Plane(P1)
Ù Plane(P2) Ù Airport(JFK) Ù Airport(SFO))
Goal(At(C1,JFK) Ù At(C2,SFO))
Action(Load(c,p,a)
PRECOND: At(c,a) ÙAt(p,a) ÙCargo(c) ÙPlane(p) ÙAirport(a)
EFFECT: ¬At(c,a) ÙIn(c,p))
Action(Unload(c,p,a)
PRECOND: In(c,p) ÙAt(p,a) ÙCargo(c) ÙPlane(p) ÙAirport(a)
EFFECT: At(c,a) Ù ¬In(c,p))
Action(Fly(p,from,to)
PRECOND: At(p,from) ÙPlane(p) ÙAirport(from) ÙAirport(to)
EFFECT: ¬ At(p,from) Ù At(p,to))
[Load(C1,P1,SFO), Fly(P1,SFO,JFK), Load(C2,P2,JFK), Fly(P2,JFK,SFO)]
4.2 Planning with state space search
Both forward and backward search possible
Progression planners
o forward state-space search
o Consider the effect of all possible actions in a given state
Regression planners
o backward state-space search
o To achieve a goal, what must have been true in the previous state?
4.2.1 Progression algorithm
Formulation as state-space search problem:
o Initial state = initial state of the planning problem
Literals not appearing are false
o Actions = those whose preconditions are satisfied
Add positive effects, delete negative
o Goal test = does the state satisfy the goal
o Step cost = each action costs 1
No functions … any graph search that is complete is a complete planning algorithm.
o E.g. A*
63
Inefficient:
irrelevant action problem
good heuristic required for efficient search
Fig 4.1: Two approaches to searching for a plan. (a) Forward (progression) state space search,
starting in the initial state and using the problem’s actions to search forward for the goal state.
(b) Backward (regression) state-space search: a belief-state search starting at the goal state(s)
and using the inverse of the actions to search backward for the initial state.
Regression algorithm
How to determine predecessors?
o What are the states from which applying a given action leads to the goal?
Goal state = At(C1, B) Ù At(C2, B) Ù … Ù At(C20, B)
Relevant action for first conjunct: Unload(C1,p,B)
Works only if pre-conditions are satisfied.
Previous state= In(C1, p) Ù At(p, B) Ù At(C2, B) Ù … Ù At(C20, B)
SubgoalAt(C1,B) should not be present in this state.
Actions must not undo desired literals (consistent)
Main advantage: only relevant actions are considered.
o Often much lower branching factor than forward search.
General process for predecessor construction
o Give a goal description G
o Let A be an action that is relevant and consistent
o The predecessors is as follows:
Any positive effects of A that appear in G are deleted.
Each precondition literal of A is added , unless it already appears.
Any standard search algorithm can be added to perform the search.
Termination when predecessor satisfied by initial state.
o In FO case, satisfaction might require a substitution.
4.2.2 Heuristics for state-space search
Neither progression nor regression is very efficient without a good heuristic.
o How many actions are needed to achieve the goal?
o Exact solution is NP hard, find a good estimate
Two approaches to find admissible heuristic:
o The optimal solution to the relaxed problem.
Remove all preconditions from actions
o The sub goal independence assumption:
The cost of solving a conjunction of sub goals is approximated by the sum of the costs of
solving the sub problems independently.
4.3 Conditional planning
Deal with uncertainty by checking the environment to see what is really happening.
Used in fully observable and nondeterministic environments:
o The outcome of an action is unknown.
64
o Conditional steps will check the state of the environment.
o How to construct a conditional plan?
Example, the vacuum-world
Actions: left, right, suck
Propositions to define states: AtL, AtR, CleanL, CleanR
How to include indeterminism?
65
o Specifies one action at each of its state nodes
o Includes every outcome branch at each of the chance nodes.
In previous example:
[Left, if AtL∧CleanL∧CleanR then [] else Suck]
For exact solutions: use minimax algorithm with 2 modifications:
o Max and Min nodes become OR and AND nodes
o Algorithm returns conditional plan instead of single move
4.3.2 And-Or-search algorithm
On(C,D) ∧ On(D,B), as shown in (d). The start state is (a). At (b), another agent has interfered;
Fig 4.7: The sequence of states as the continuous planning agent tries to reach the goal state
putting D on B. the agent has executed Move (C, D) but has failed, dropping C on A instead. It
retries Move (C, D), reaching the goal state (d).
Initial state (a)
67
Fig 4.8: The initial plan constructed by the continuous planning agent. The plan is
indistinguishable, so far, from that produced by a normal partial-order planner.
Assume that percepts don’t change and this plan is constructed
Ordering constraint between Move (D,B) and Move(C,D)
Start is label of current state during planning.
Before the agent can execute the plan, nature intervenes:
o D is moved onto B
Fig 4.9: After someone else moves D onto B, the unsupported links supplying clear (B) and On
(D, G) are dropped, producing this plan.
Start contains now On (D, B)
Agent perceives: Clear (B) and On (D, G) are no longer true
o Update model of current state (start)
Causal links from Start to Move (D, B) (Clear (B) and On (D,G)) no longer valid.
Remove causal relations and two PRECOND of Move (D,B) are open
Replace action and causal links to finish by connecting Start to Finish.
Fig 4.10: The link supplied by Move (D,B) has been replaced by one from start and the now-
redundant step Move (D, B) has been dropped.
Extending: whenever a causal link can be supplied by a previous step
All redundant steps (Move(D,B) and its causal links) are removed from the plan
Execute new plan, perform action Move(C,D)
o This removes the step from the plan
Fig 4.11: After Move(C, D) is executed and removed from the plan, the effects of the start step
reflect the fact C ended up on A instead of the intended D. The goal precondition On(C, D) is
still open.
Execute new plan, perform action Move(C,D)
o Assume agent is clumsy and drops C on A
No plan but still an open PRECOND
Determine new plan for open condition
Again Move(C,D)
68
Fig 4.12: After Move (C, D) is executed and dropped from the plan, the remaining open
condition On(C, D) is resolved by adding a casual link from the new start step. The plan is now
completed.
Similar to POP
On each iteration find plan-flaw and fix it
Possible flaws: Missing goal, Open precondition, Causal conflict, unsupported link, redundant
action, Unexecuted action, unnecessary historical goal
∧ Partner(B,A))
Goal(Returned(Ball) ∧ At(agent,[x,Net]))
¬At(partner,[x,y])
EFFECT: Returned(Ball))
Action(Go(agent,[x,y])
69
Inductive Learning
Learning Decision Trees
4.6.1 Learning Agents
based on previous agent designs, such as reflexive, model-based, goal-based agents
o those aspects of agents are encapsulated into the performance element of a learning
agent
a learning agent has an additional learning element
o usually used in combination with a critic and a problem generator for better learning
most agents learn from examples
o inductive learning
Learning Agent Model
70
A direct mapping from conditions on the current state to actions.
A means to infer relevant properties of the world from the percept sequence.
Information about the way the world evolves.
Information about the results of possible actions the agent can take.
Utility information indicating the desirability of world states.
Action-value information indicating the desirability of particular actions in particular states.
Goals that describe classes of states whose achievement maximizes the agent's utility.
Component Representation
many possible representation schemes
o weighted polynomials (e.g. in utility functions for games)
o propositional logic
o predicate logic
o probabilistic methods (e.g. belief networks)
learning methods have been explored and developed for many representation schemes
71
Hypotheses
finding a suitable hypothesis can be difficult
o since the function f is unknown, it is hard to tell if the hypothesis h is a good
approximation
the hypothesis space describes the set of hypotheses under consideration
o e.g. polynomials, sinusoidal functions, propositional logic, predicate logic, ...
o the choice of the hypothesis space can strongly influence the task of finding a
suitable function
o while a very general hypothesis space (e.g. Turing machines) may be guaranteed to
contain a suitable function, it can be difficult to find it
Ockham’s razor: if multiple hypotheses are consistent with the data, choose the simplest one
Example Inductive Learning 1
f(x)
x
Fig 4.14: Inductive learning 1
input-output pairs displayed as points in a plane
the task is to find a hypothesis (functions) that connects the points
o either all of them, or most of them
various performance measures
o number of points connected
o minimal surface
o lowest tension
Example Inductive Learning 2
72
Example Inductive Learning 3
hypothesis expressed as a polynomial function
incorporates all samples
more complicated to calculate than linear segments
no discontinuities
better predictive power
Example Inductive Learning 4
hypothesis is a linear functions
does not incorporate all samples
extremely easy to compute
low predictive power
Fig4.18: (a) environment with utilities (rewards) of terminal states (b) transition model M ij
Terminology
Reward-to-go = sum of rewards from state to terminal state
additive utility function: utility of sequence is sum of rewards accumulated in sequence
Thus for additive utility function and state s:
Expected utility of s = expected reward-to-go of s
Training sequence eg.
(1,1) →(2,1) →(3,1) →(3,2) →(3,1) →(4,1) →(4,2) [-1]
(1,1) →(1,2) →(1,3) →(1,2) →· · · →(3,3) →(4,3) [1]
(1,1) →(2,1) →· · · →(3,2) →(3,3) →(4,3) [1]
Aim: use samples from training sequences to learn (an approximation to) expected reward for
all states.
i.e. generate an hypothesis for the utility function
Note: similar to sequential decision problem, except rewards initially unknown.
A generic passive reinforcement learning agent
Learning is iterative — successively updates estimates of utilities
74
Fig 4.20: Naïve updating.
75
• The basic difference between TD and ADP is that TD adjusts a state to agree with the
observed successor, while ADP makes a state agree with all successors that might occur,
weighted by their probabilities. More importantly, ADP's adjustments may need to be
propagated across all of the utility equations, while TD's affect only the current equation. TD is
essentially a crude first approximation to ADP.
• A middle-ground can be found by bounding or ordering the number of adjustments made in
ADP, beyond the simple one made in TD. The prioritized-sweeping heuristic prefers only to
make adjustments to states whose likely successors have just undergone large adjustments
in their utility estimates. Such approximate ADP systems can be very nearly as efficient as
ADP in terms of convergence, but operate much more quickly.
Active Learning in an Unknown Environment
• The difference between active and passive agents is that passive agents learn a fixed policy,
while the active agent must decide what action to take and how it will affect its rewards. To
represent an active agent, the environment model M is extended to give the probability of a
transition from a state i to a state j, given an action a. Utility is modified to be the reward of the
state plus the maximum utility expected depending upon the agent's action:
U(i) = R(i) + maxa x SUMjMaijU(j)
• An ADP agent is extended to learn transition probabilities given actions; this is simply another
dimension in its transition table. A TD agent must similarly be extended to have a model of
the environment.
Learning Action-Value Functions
• An action-value function assigns an expected utility to the result of performing a given action
in a given state. If Q(a, i) is the value of doing action a in state i, then
U(i) = maxa Q(a, i)
• The equations for Q-learning are similar to those for state-based learning agents. The
difference is that Q-learning agents do not need models of the world. The equilibrium
equation, which can be used directly (as with ADP agents) is
Q(a, i) = R(i) + SUMjMaijmaxa' Q(a', j)
• The temporal difference version does not require that a model be learned; its update equation
is
Q(a, i) <- Q(a, i) + a(R(i) + maxa' Q(a', j) - Q(a, i))
4.9.4 Applications of Reinforcement Learning
• The first significant reinforcement learning system was used in Arthur Samuel's checker-
playing program. It used a weighted linear function to evaluate positions, though it did not use
observed rewards in its learning process.
• TD-gammon [Tesauro, 1992] has an evaluation function represented by a fully-connected
neural network with one hidden layer of 80 nodes; with the inclusion of some precomputed
board features in its input, it was able to reach world-class play after about 300,000 training
games.
• A case of reinforcement learning in robotics is the famous cart-pole balancing problem. The
problem is to control the position of the cart (along a single axis) so as to keep a pole
balanced on top of it upright, while staying within the limits of the track length. Actions are
usually to jerk left or right, the so-called bang-bang control approach.
• The first work on this problem was the BOXES system [Michie and Chambers, 1968], in which
state space was partitioned into boxes, and reinforcement propogated into the boxes. More
recent simulated work using neural networks [Furuta et al., 1984] simulated the triple-
inverted-pendulum problem, in which three poles balance one atop another on a cart.
4.10 Learning and Decision Trees
Decision tree induction is one of the simplest, and yet most successful forms of learning.
A decision tree takes as input an object or situation described by a set of attributes and
returns a “decision” the predicted output value for the input.
The input attributes can be discrete or continuous. The output value can also be discrete or
continuous.
76
Learning a discrete valued function is called classification learning; learning a continuous
function is called regression.
A decision tree reaches its decision by performing a sequence of tests.
Each internal node in the tree corresponds to a test value of one of the properties, and the
branches from the node are labeled with the possible value of the test.
Each leaf node in the tree specifies the value to be returned if that leaf is reached.
Problem: decide whether to wait for a table at a restaurant, based on the following attributes:
1. Alternate: is there an alternative restaurant nearby?
2. Bar: is there a comfortable bar area to wait in?
3. Fri/Sat: is today Friday or Saturday?
4. Hungry: are we hungry?
5. Patrons: number of people in the restaurant (None, Some, Full)
6. Price: price range ($, $$, $$$)
7. Raining: is it raining outside?
8. Reservation: have we made a reservation?
9. Type: kind of restaurant (French, Italian, Thai, Burger)
10. Wait Estimate: estimated waiting time (0-10, 10-30, 30-60, >60)
Decision Tree Example
77
Decision Tree Algorithm
recursive formulation
o select the best attribute to split positive and negative examples
o if only positive or only negative examples are left, we are done
o if no examples are left, no such examples were observers
return a default value calculated from the majority classification at the node’s
parent
o if we have positive and negative examples left, but no attributes to split them we are
in trouble
samples have the same description, but different classifications
may be caused by incorrect data (noise), or by a lack of information, or by a
truly non-deterministic domain
Restaurant Sample Set
Performance of Decision Tree Learning
quality of predictions
o predictions for the classification of unknown examples that agree with the correct
result are obviously better
o can be measured easily after the fact
o it can be assessed in advance by splitting the available examples into a training set
and a test set
learn the training set, and assess the performance via the test set
size of the tree
o a smaller tree (especially depth-wise) is a more concise representation
Table 4.1: Restaurant Sample Set
Example Attributes Goal Example
Alt Bar Fri Hun Pat Price Rain Res Type Est WillWait
X1 Yes No No Yes Some $$$ No Yes French 0-10 Yes X1
X2 Yes No No Yes Full $ No No Thai 30-60 No X2
X3 No Yes No No Some $ No No Burger 0-10 Yes X3
X4 Yes No Yes Yes Full $ No No Thai 10-30 Yes X4
X5 Yes No Yes No Full $$$ No Yes French >60 No X5
X6 No Yes No Yes Some $$ Yes Yes Italian 0-10 Yes X6
X7 No Yes No No None $ Yes No Burger 0-10 No X7
X8 No No No Yes Some $$ Yes Yes Thai 0-10 Yes X8
X9 No Yes Yes No Full $ Yes No Burger >60 No X9
X10 Yes Yes Yes Yes Full $$$ No Yes Italian 10-30 No X10
X11 No No No No None $ No No Thai 0-10 No X11
X12 Yes Yes Yes Yes Full $ No No Burger 30-60 Yes X12
5.1 Minimax search procedure - Adding alpha-beta cutoffs in Game PlayingAdversarial Search
Competitive environments, in which the agents’ goals are in conflict, give rise to adversarial
search problems- often known as games. In AI, “games” are usually of a rather specialized kind – in
which there are two agents whose actions must alternate and in which the utility values at the end of
the game are always equal and opposite.
A game can be formally defined as a kind of search problem with the following components:
The initial state, which includes the board position and identifies the player to move.
A successor function, which returns a list of (move, state) pairs, each indicating a legal move
and the resulting state.
A terminal test, which determines when the game is over. States where the game has ended
are called terminal states.
A utility function, which gives a numeric value for the terminal states.
The initial state and the legal moves for each side define the game treefor the game. The
following figure shows part of the game tree for tic-tac-toe. From the initial state, MAX has nine
possible moves. Play alternates between MAX’s placing an X and MIN’s placing an O until we reach
leaf nodes corresponding to terminal states such that one player has three in a row or all the squares
are filled.
81
Here, the first MIN node, labeled B, has three successors with values 3, 12, and 8, so its
minimax value is 3. The minimax algorithm computes the minimax decision from the current state.
Minimax algorithm
82
The value of the root node is given by
MINIMAX-VALUE(root) = max(min(3,12,8), min(2,x,y), min(14,5,2))
= max(3,min(2,x,y),2)
=max(3,z,2) where z<=2
=3
x and y: two unevaluated successors
z: minimum of x and y
Properties of α-β
Pruning does not affect final result
Good move ordering improves effectiveness of pruning
A simple example of the value of reasoning about which computations are relevant (a form of
metareasoning)
Why is it called α-β?
α is the value of the best (i.e., highest-value) choice found so far at any choice point along the
path for max
β is the value of the best (i.e., lowest-value) choice found so far at any choice point along the
path for min
If v is worse than α, max will avoid it
o Prune that branch
83
5.3 Expert System shells and Knowledge Acquisition
Expert Systems (ES), also called Knowledge-Based Systems (KBS) or simply Knowledge
Systems (KS), are computer programs that use expertise to assist people in performing a wide variety
of functions, including diagnosis, planning, scheduling and design. ESs are distinguished from
conventional computer programs in two essential ways (Barr, Cohen et al. 1989):
Expert systems reason with domain-specific knowledge that is symbolic as well as numerical;
Expert systems use domain-specific methods that are heuristic (i.e., plausible) as well as
algorithmic.
The technology of expert systems has had a far greater impact than even the expert systems
business. Expert system technology has become widespread and deeply embedded. As expert
system techniques have matured into a standard information technology, the most important recent
trend is the increasing integration of this technology with conventional information processing, such as
data processing or management information systems.
5.3.1 The Building Blocks of Expert Systems
Every expert system consists of two principal parts: the knowledge base; and the reasoning,
or inference, engine.
The knowledge base of expert systems contains both factual and heuristic knowledge.
Factual knowledge is that knowledge of the task domain that is widely shared, typically found
in textbooks or journals, and commonly agreed upon by those knowledgeable in the particular
field.
Heuristic knowledge is the less rigorous, more experiential, more judgmental knowledge of
performance.
In contrast to factual knowledge, heuristic knowledge is rarely discussed, and is largely
individualistic.
It is the knowledge of good practice, good judgment, and plausible reasoning in the field. It is
the knowledge that underlies the "art of good guessing."
Knowledge representation formalizes and organizes the knowledge. One widely used
representation is the production rule, or simply rule.
A rule consists of an IF part and a THEN part (also called a condition and an action). The IF
part lists a set of conditions in some logical combination.
The piece of knowledge represented by the production rule is relevant to the line of reasoning
being developed if the IF part of the rule is satisfied; consequently, the THEN part can be
concluded, or its problem-solving action taken.
Expert systems whose knowledge is represented in rule form are called rule-based systems.
Another widely used representation, called the unit (also known as frame, schema, or list
structure) is based upon a more passive view of knowledge.
The unit is an assemblage of associated symbolic knowledge about an entity to be
represented. Typically, a unit consists of a list of properties of the entity and associated
values for those properties.
Since every task domain consists of many entities that stand in various relations, the properties
can also be used to specify relations, and the values of these properties are the names of other units
that are linked according to the relations. One unit can also represent knowledge that is a "special
case" of another unit, or some units can be "parts of" another unit.
The problem-solving model, or paradigm, organizes and controls the steps taken to solve the
problem. One common but powerful paradigm involves chaining of IF-THEN rules to form a line of
reasoning. If the chaining starts from a set of conditions and moves toward some conclusion, the
method is called forward chaining. If the conclusion is known (for example, a goal to be achieved) but
the path to that conclusion is not known, then reasoning backwards is called for, and the method is
backward chaining. These problem-solving methods are built into program modules called inference
engines or inference procedures that manipulate and use knowledge in the knowledge base to form a
line of reasoning.
The most important ingredient in any expert system is knowledge. The power of expert
systems resides in the specific, high-quality knowledge they contain about task domains.
AI researchers will continue to explore and add to the current repertoire of knowledge
representation and reasoning methods. But in knowledge resides the power.
Because of the importance of knowledge in expert systems and because the current
knowledge acquisition method is slow and tedious, much of the future of expert systems
depends on breaking the knowledge acquisition bottleneck and in codifying and representing
a large knowledge infrastructure.
84
5.3.2 Expert System Shell
A rule-based, expert system maintains a separation between its Knowledge-base and that
part of the system that executes rules, often referred to as the expert system shell.
The system shell is indifferent to the rules it executes. This is an important distinction,
because it means that the expert system shell can be applied to many different problem
domains with little or no change.
85
Fig. 5.7: Expert system Architecture.
Theshell portion includes software modules whose purpose it is to,
Process requests for service from system users and application layer modules;
Support the creation and modification of business rules by subject matter experts;
Translate business rules, created by a subject matter experts, into machine-readable forms;
Execute business rules; and
Provide low-level support to expert system components (e.g., retrieve metadata from and
save metadata to knowledge base, build Abstract Syntax Trees during rule translation of
business rules, etc.).
5.5 Swarm Intelligent Systems. (0r)What is Ant Colony System, its Application and Working
The term swarm intelligence is used for the collective behavior of a group (swarm) of animals
as a single living creature, where collective intelligence emerges via grouping and
communication, actually resulting in more successful foraging for each individual in the group.
Swarm intelligence is a specialization in the field of self-organizing systems (adaption). When
the route of swarm of ants is blocked, it can be observed that they find another new shortest
route to their destination; this shows robustness.
These agents (ants) can be added or removed without compromising the total system due to
its distributed nature.
Ants have been living on the earth for more than 100 million years and can be found almost
anywhere on the planet.
They also use chemicals called pheromones to leave scent trails for other ants to follow.
There are two popular swarm-inspired methods in the computational intelligence area: ant
colony optimization (ACO) and particle swarm optimization (PSO).
ACO was inspired by the behavior of ants and has many successful applications in discrete
optimization problems.
Ant algorithms were first tested and validated on the travelling salesman problem (TSP). The
TSP was chosen for several reasons, one of them being that it is a shortest path problem for
which the ant colony metaphor can easily be adopted.
The main idea is that of having a set of agents, called ants, search in parallel for good
solutions and cooperate through the pheromone-mediated indirect method of communication.
5.5.1 Importance of the Ant colony Paradigm
87
The evolving computational paradigm of ant colony intelligent system (ACIS) is being used as
an intelligent tool to help researchers solve many problems in different areas of science and
technology.
Scientists nowadays are using the functions of real ant colonies to solve many combinational
optimization problems in different engineering applications.
Ant colony systems
An artificial ant colony system (AACS) is a random stochastic population-based heuristic
algorithm of agents that simulate the natural behavior of ants, developing mechanism of
cooperation and learning, which enables the exploration of the positive feedback between
agents as a search mechanism.
Biological ant colony systems
Social insects like ants, bees, wasps, and termites perform their simple tasks themselves,
independently of other members of the colony.
This emergent behavior of self-organization in a group of social insects is known as swarm
intelligence, which has four basic ingredients
o Positive feedback
o Negative feedback (e.g., saturation, exhaustion, competition)
o Amplification of fluctuations (e.g., random walk, errors, random task switching)
o Multiple interaction
An important and interesting behavior of ant colonies is their foraging behavior and, in
particular, ability to find the shortest paths between food sources and their nests.
While walking from food sources to their nest and vice versa, ants deposit a chemical
substance called pheromone on the ground, forming in this way a pheromone trail. The
sketch shown in the figure gives a general idea of the pheromone trail.
Fig 5.8: Foraging behavior of ants moving from their nest (origin) to the food source
(destination), taking the shortest possible route through pheromone mediation [stage (a) to
stage (d)]
How do real ants find the shortest path? Ants can smell pheromones; while choosing their
path, they tend to choose the paths marked by strong pheromone concentrations.
The pheromone trail allows ants to find their way back to the food.
5.5.2 Artificial Ant colony systems
In AACSs, the use of
o A colony of cooperating individuals
o An artificial pheromone trail for local stigmergetic communication,
o A sequence of local moves for finding the shortest paths, and
o Stochastic decision policy using local information and no look-ahead is the same as
in real ACSs.
88
However artificial ants also have some characteristics which do not find counterparts in real
ants. These are listed below.
o Artificial ants live in a discrete world and their moves consist of transitions from
discrete states to discrete states.
o Artificial ants have an internal state. This private state contains the memory of the
ant’s past actions.
o Artificial ants deposit a particular amount of pheromone, which is a function of the
quality of the solution found.
o An artificial ant’s timing in pheromone laying is problem-dependent and often does
not reflect a real ant’s behavior. For example, in many cases, artificial ants update
pheromone trails only after having generated a solution.
o To improve overall system efficiency, ant algorithms can be enriched with extra
capabilities such as look-ahead, local optimization, backtracking, elastic approach,
ranking-based approach, etc., which cannot be found in real ants.
5.5.3 Development of the Ant colony System
The ant system was the first example of an ant colony optimization (ACO) algorithm and, in
fact, originally a set of three algorithms called ant-cycle, ant density and ant-quantity.
These three algorithms were proposed in Dorigo’s doctoral dissertation.
While an ant-density and ant-quantity, ants can update the pheromone trail directly after a
move from one node to an adjacent one, in ant-cycle the pheromone update was carried out
only after all the ants had constructed their tours and the amount of pheromone deposited by
each ant was set to a function denoting the tour quality.
The major merit of the AS, whose computational results were promising but not competitive
enough as compared to other more established approaches, was to stimulate a number of
researchers to develop extensions and improvements of its basic ideas so as to produce
more performing, and often state-of-the-art, algorithms.
5.5.4 Applications of Ant colony Intelligence
Two classes of applications
Static combinational optimization problems
Dynamic combinational optimization problems
Static combinational optimization problems
Static problems are those in which the characteristics of the problem are given once and for
all when the problem is defined, and do no change while the problem is being solved.
In ACO algorithms for static combinational optimization, the way ants update pheromone trails
changes across algorithms: any combination of online step-by-step pheromone updates and
online delayed pheromone update is possible.
A typical example of such problems is the classic travelling salesman problem in which city
locations and their relative distances are a part of the problem definition and do not change at
run-time.
Dynamic combinational optimization problems
Dynamic combinational optimization problems are defined as functions of some quantities
whose values are set by the dynamics of an underlying system.
The problem changes therefore at run-time and the optimization algorithms must be capable
of adapting online to the changing environment.
The working of Ant Colony System
Essentially, an ACS algorithm performs a loop, applying two basic procedures:
o Specifying how ants construct or modify a solution for the problem in hand, and
o Updating the pheromone trail.
The construction or modification of a solution is performed in a probabilistic way. The
probability of adding a new term to the solution under construction is, in turn, a function of a
problem-dependent heuristic and the amount of pheromone previously deposited in this trail.
The pheromone trails are updated considering the evaporation rate and the quality of the
current solution.
Probabilistic transition rule
In a simple ACO algorithm, the main task of each artificial ant, similar to their natural
counterparts, is to find a shortest path between a pair of nodes on a graph on which the
problem representation is suitably mapped.
89
Let G = (N, A) be a connected graph with n = |N| nodes. The simple ant colony optimization
(S-ACO) algorithm can be used to find the solution to a shortest path problem defined on the
graph G, where a solution is a path on the graph connecting a source node s to a destination
node d shown in figure, and the path length given by the number of loops in the path to each
arc (i, j) of the graph is associated with a variable τijcalled an artificial pheromone trail.
At the beginning of the search process, a small amount of pheromone τ 0 is assigned to all the
arcs.
Pheromone trails are read and written by ants, of that arc to build good solution. At each
node, local information maintained in the node itself and/or in its outgoing arcs is used in a
stochastic way to decide the next node to move to.
The destination rules of an ant K located in node i use the pheromone trails τ ijto compute the
probability with which it should choose node j € N ias the next node to move to, where N iis
the set of one-step neighbors of node i:
Figure 5.9: Building of solutions by an ant from the source to the destination node.
Pheromone Updating
While building a solution, ants deposit pheromone information on the arcs they use. In S-
ACO, ants deposit a constant amount ∆τ of pheromone.
Consider an ant that at time t moves from node i to node j. It will change the pheromone value
τ ijas follows:
τij(t) τ ij(t) + ∆τ
Using this rule, which simulates real ants pheromone deposits on arc (i, j), an ant using the
arc connecting node i to node j increases the probability that other ants will use the same arc
in the future.
The way the pheromone trail is updated can be classified mainly into three types as detailed
below:
o Online step by step pheromone update when moving from node i to neighboring node
j, the ant can update the pheromone trail τ ijon nth arc (i, j).
o Online delayed pheromone update once a solution is built; the ant can retrace the
same path backward and update the pheromone trails on the traversed arcs.
o Off-line pheromone updates pheromone updates performed using the global
information available are called off-line pheromone updates.
90