Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views31 pages

Advanced AIML

Uploaded by

Vidyarani patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views31 pages

Advanced AIML

Uploaded by

Vidyarani patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Advanced AI and ML

Module 1- Intelligent Agents


Environment: An environment in artificial intelligence is the surrounding of the agent.
Agent: An agent is anything that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators.
Percept: It refers to the agent’s perceptual inputs at any given instant.
Percept sequence: The complete history of everything the agent has ever perceived. In general,
an agent’s choice of action at any given instant can depend on the entire percept sequence
observed to date, but not on anything it hasn’t perceived.
Fig 1: Agents interact with environments through sensors and actuators

Agent function: Maps any given percept sequence to an action. The agent function for an
artificial agent will be implemented by an agent program.
Agent program: The agent function is an abstract mathematical description; the agent program is
a concrete implementation, running within some physical system.
Example: The vacuum-cleaner world shown in Figure 2
This particular world has just two locations: squares A and B. The vacuum agent perceives which
square it is in and whether there is dirt in the square. It can choose to move left, move right, suck
up the dirt, or do nothing. One very simple agent function is the following: if the current square
is dirty, then suck; otherwise, move to the other square.
Advanced AI and ML
Module 1- Intelligent Agents
Partial tabulation of a simple agent function for the vacuum-cleaner world shown in Figure 3 :

Rational agent: For each possible percept sequence, a rational agent should select an action that
is expected to maximize its performance measure, given the evidence provided by the percept
sequence and whatever built-in knowledge the agent has.
Performance measure: When an agent is plunked down in an environment, it generates a
sequence of actions according to the percepts it receives. This sequence of actions go through a
sequence of states that evaluates any given sequence of environment states.
A general rule, it is better to design performance measures according to what one actually
wants in the environment, rather than according to how one thinks the agent should behave.
Rationality depends on four things:
1. The performance measure that defines the criterion of success.
2. The agent’s prior knowledge of the environment.
3. The actions that the agent can perform.
4. The agent’s percept sequence to date.
The performance measure of the vaccum-cleaner in fig 3 tabulated and the environment, and
what sensors and actuators the agent has. Let us assume the following:
1. The performance measure awards one point for each clean square at each time step, over
a “lifetime” of 1000time steps.
2. The “geography” of the environment is known a prior but the dirt distribution and the
initial location of the agent are not. Clean squares stay clean and sucking cleans the
current square.
3. The Left and Right actions move the agent left and right except when this would take the
agent outside the environment, in which case the agent remains where it is.
4. The only available actions are Left, Right, and Suck.
5. The agent correctly perceives its location and whether that location contains dirt.
Advanced AI and ML
Module 1- Intelligent Agents
under these circumstances the agent is indeed rational its expected performance is at least as high
as any other agent’s. One can see easily that the same agent would be irrational under different
circumstances. For example, once all the dirt is cleaned up, the agent will oscillate needlessly back
and forth; if the performance measure includes a penalty of one point for each movement left or
right, the agent will fare poorly. A better agent for this case would do nothing once it is sure that
all the squares are clean. If clean squares can become dirty again, the agent should occasionally
check and re-clean them if needed. If the geography of the environment is un known, the agent
will need to explore it rather than stick to squares A and B.
Omniscience: An omniscient agent knows the actual outcome of its actions and can act
accordingly.
Information gathering: Actions in order to modify future percepts.
Exploration: That must be undertaken by a vacuum-cleaning agent in an initially unknown
environment.
Learning: The agent’s initial configuration could reflect some prior knowledge of the
environment, but as the agent gains experience this may be modified and augmented. There are
extreme cases in which the environment is completely known a priori.
Autonomy: The extent that an agent relies on the prior knowledge of its designer rather than on
its own percepts, we say that the agent lacks autonomy. A rational agent should be autonomous it
should learn what it can to compensate for partial or incorrect prior knowledge.
For example: A vacuum-cleaning agent that learns to foresee where and when additional dirt will
appear will do better than one that does not. As a practical matter, one seldom requires complete
autonomy from the start: when the agent has had little or no experience, it would have to act
randomly unless the designer gave some assistance. So, just as evolution provides animals with
enough built-in reflexes to survive long enough to learn for themselves, it would be reasonable to
provide an artificial intelligent agent with some initial knowledge as well as an ability to learn.
After sufficient experience of its environment, the behavior of a rational agent can become
effectively independent of its prior knowledge. Hence, the incorporation of learning allows one to
design a single rational agent that will succeed in a vast variety of environments.

Task environments: It is essentially the “problems” to which rational agents are the
“solutions.”
PEAS (Performance, Environment, Actuators, Sensors):
In designing an agent, the first step must always be to specify the task environment as fully as
possible.
Performance Measure: Desirable qualities include getting to the correct destination; minimizing
fuel consumption and wear and tear; minimizing the trip time or cost; minimizing violations of
Advanced AI and ML
Module 1- Intelligent Agents
traffic laws and disturbances to other drivers; maximizing safety and passenger comfort;
maximizing profits.

Environment: Any taxi driver must deal with a variety of roads, ranging from rural lanes and
urban alleys to 12-lane freeways. The roads contain other traffic, pedestrians, stray animals, road
works, police cars, puddles. The taxi must also interact with potential and actual passengers.
There are also some optional choices. Obviously, the more restricted the environment, the easier
the design problem.
Actuators: Automated taxi include those available to a human driver: control over the engine
through the accelerator and control over steering and braking. In addition, it will need output to a
display screen or voice synthesizer to talk back to the passengers, and perhaps some way to
communicate with other vehicles, politely or otherwise.
Sensors: The taxi will include one or more controllable video cameras so that it can see the road;
it might augment these with infrared or sonar sensors to detect distances to other cars and
obstacles. To avoid speeding tickets, the taxi should have a speedometer, and to control the
vehicle properly, especially on curves, it should have an accelerometer. To determine the
mechanical state of the vehicle, it will need the usual array of engine, fuel, and electrical system
sensors. Like many human drivers, it might want a global positioning system (GPS) so that it
doesn’t get lost. Finally, it will need a keyboard or microphone for the passenger to request a
destination.

Softbot: Designed to scan Internet news sources and show the interesting items to its users,
while selling advertising space to generate revenue.

Properties of task environments:


First, we list the dimensions, then we analyze several task environments to illustrate the ideas.
Advanced AI and ML
Module 1- Intelligent Agents

Fully observable vs Partially observable:


Fully observable Partially observable
If an agent’s sensors give it access to the An environment might be partially observable
complete state of the environment at each because of noisy and inaccurate sensors or
point in time, then we say that the task because parts of the state are simply missing
environment is fully observable. from the sensor data.
A task environment is effectively fully If the agent has no sensors at all then the
observable if the sensors detect all aspects environment is unobservable.
that are relevant to the choice of action;
relevance, in turn, depends on the
performance measure.
Fully observable environments are convenient The agent need to maintain internal state to
because the agent need not maintain any keep track of the world
internal state to keep track of the world.

Single agent vs Multi agent:


Single Agent Multi Agent
An agent solving a crossword puzzle An agent playing chess is in a two
by itself is clearly in a single-agent agent environment.
environment
Advanced AI and ML
Module 1- Intelligent Agents
Competitive and Cooperative : Chess is a competitive multiagent environment. In the
taxi-driving environment, on the other hand, avoiding collisions maximizes the performance
measure of all agents, so it is a partially cooperative multiagent environment. It is also partially
competitive because, for example, only one car can occupy a parking space. The agent-design
problems in multiagent environments.

Determinstic Stochastic
If the next state of the environment is If the next state of the environment is not
completely determined by the current state completely determined by the current state
and the action executed by the agent, then we and the action executed by the agent, then we
say the environment is deterministic say the environment is stochastic
An agent need not worry about uncertainty in If the environment is partially observable,
a fully observable, deterministic environment. however, then it could appear to be stochastic.

Uncertain and Non-deterministic: If it is not fully observable or not deterministic then


we say that it is uncertain. A nondeterministic environment is one in which actions are
characterized by their possible outcomes, but no probabilities are attached to them. Non
deterministic environment descriptions are usually associated with performance measures that
require the agent to succeed for all possible outcomes of its actions.

Episodic Sequential
In an episodic task environment, the agent’s The current decision could affect all future
experience is divided into atomic episodes. In decisions.
each episode the agent receives a percept and
then performs a single action.
The next episode does not depend on the Chess and taxi driving are sequential: in both
actions taken in previous episodes. Many cases, short-term actions can have long-term
classification tasks are episodic. consequences.
An agent that has to spot defective parts on an Episodic environments are much simpler than
assembly line bases each decision on the sequential environments because the agent
current part, regardless of previous decisions; does not need to think ahead.
moreover, the current decision doesn’t affect
whether the next part is defective.

Static Dynamic
If the environment can not change while an If the environment can change while an agent
agent is deliberating, then we say the is deliberating, then we say the environment
environment is static is dynamic
Static environments are easy to deal with Dynamic environments, on the other hand, are
because the agent need not keep looking at continuously asking the agent what it wants to
the world while it is deciding on an action, do; if it hasn’t decided yet, that counts as
nor need it worry about the passage of time. deciding to do nothing.
Advanced AI and ML
Module 1- Intelligent Agents
Crossword puzzles are static. Taxi driving is clearly dynamic: the other cars
and the taxi itself keep moving while the
driving algorithm dithers about what to do
next.

Semidynamic: If the environment itself does not change with the passage of time but the
agent’s performance score does, then we say the environment is semidynamic. Chess, when
played with a clock, is semidynamic.

Discrete and Continuous: The discrete/continuous distinction applies to the state of the
environment, to the way time is handled, and to the percepts and actions of the agent. For
example, the chess environment has a finite number of distinct states. chess also has a discrete
set of percepts and actions. Taxi driving is a continuous-state and continuous-time problem: the
speed and location of the taxi and of the other vehicles sweep through a range of continuous
values and do so smoothly over time. Taxi-driving actions are also continuous . Input from
digital cameras is discrete, strictly speaking, but is typically treated as representing continuously
varying intensities and locations.

Known vs. unknown: In a known environment, the outcomes (or outcome probabilities if
the environment is stochastic) for all actions are given. Obviously, if the environment is
unknown, the agent will have to learn how it works in order to make good decisions. Note that
the distinction between known and unknown environments is not the same as the one between
fully and partially observable environments. It is quite possible for a known environment to be
partially observable—for example, in solitaire card games, I know the rules but am still unable to
see the cards that have not yet been turned over. Conversely, an unknown environment can be
fully observable.

Examples of task environments and their characteristics:


Advanced AI and ML
Module 1- Intelligent Agents
Environment class: A general-purpose environment simulator that places one or more agents
in a simulated environment, observes their behavior over time, and evaluates them according to a
given performance measure. Such experiments are often carried out not for a single environment
but for many environments.

Environment Generator: If we designed the agent for a single scenario, we might be able
to take advantage of specific properties of the particular case but might not identify a good
design for driving in general.

Agent Program: The agent program takes just the current percept as input because nothing
more is available from the environment; if the agent’s actions need to depend on the entire
percept sequence, the agent will have to remember the percepts.

Architecture: The program will run on some sort of computing device with physical sensors
and actuators.

The TABLE-DRIVEN-AGENT program is invoked for each new percept and returns an
action each time.

Let P be the set of possible percepts and let T be the lifetime of the agent (the total number of
percepts it will receive). The lookup table will contain

. Consider the automated taxi: the visual input from a single camera comes in at the rate of
roughly 27 megabytes per second (30 frames per second, 640×480 pixels with 24 bits of color
information). This gives a lookup table with over 10250,000,000,000 entries for an hour’s
driving. Even the lookup table for chess—a tiny, well-behaved fragment of the real world—
would have at least 10150 entries. The daunting size of these tables means that (a) no physical
agent in this universe will have the space to store the table, (b) the designer would not have time
to create the table, (c) no agent could ever learn all the right table entries from its experience, and
(d) even if the environment is simple enough to yield a feasible table size, the designer still has
no guidance about how to fill in the table entries. Despite all this, TABLE-DRIVEN-AGENT
does do what we want: it implements the desired agent function. The key challenge for AI is to
Advanced AI and ML
Module 1- Intelligent Agents
find out how to write programs that, to the extent possible, produce rational behavior from a
smallish program rather than from a vast table.

Four basic kinds of agent programs that embody the principles underlying
almost all intelligent systems:
 Simple reflex agents
 Model-based reflex agents
 Goal-based agents
 Utility-based agents.
The agent program for a simple reflex agent in the two-state vacuum environment for
figure 2.

Simple Reflex agents: The simplest kind of agent is the simple reflex agent. These agents
select actions on the basis of the current percept, ignoring the rest of the percept history. The
most obvious reduction comes from ignoring the percept history, which cuts down the number of
possibilities from 4^T to just 4. Imagine yourself as the driver of the automated taxi. If the car in
front brakes and its brake lights come on, then you should notice this and initiate braking.

A more general and flexible approach is first to build a general-purpose interpreter for condition
action rules and then to create rule sets for specific task environments. Figure gives the structure
of this general program in schematic form, showing how the condition–action rules allow the
agent to make the connection from percept to action.

Schematic diagram of a simple reflex agent:


Advanced AI and ML
Module 1- Intelligent Agents

We use rectangles to denote the current internal state of the agent’s decision process, and ovals to
represent the background information used in the process. The agent program, which is also very
simple, is shown in diagram The INTERPRET-INPUT function generates an abstracted
description of the current state from the percept, and the RULE-MATCH function returns the
first rule in the set of rules that matches the given state description. The agent in diagram will
work only if the correct decision can be made on the basis of only the current percept—that is,
only if the environment is fully observable.

Model-based reflex agents:


when the agent turns the steering wheel clockwise, the car turns to the right, or that after
driving for five minutes northbound on the freeway, one is usually about five miles north of
where one was five minutes ago. This knowledge about “how the world works”—whether
implemented in simple Boolean circuits or in complete scientific theories—is called a model of
the world. An agent that uses such a model is called a model-based agent.
Figure gives the structure of the model-based reflex agent with internal state, showing
how the current percept is combined with the old internal state to generate the updated
description of the current state, based on the agent’s model of how the world works.

The agent program is shown in Figure The interesting part is the function UPDATE-
STATE, which is responsible for creating the new internal state description. The details of how
models and states are represented vary widely depending on the type of environment and the
particular technology used in the agent design. For example : the taxi may be driving back
Advanced AI and ML
Module 1- Intelligent Agents
home, and it may have a rule telling it to fill up with gas on the way home unless it has at least
half a tank. Although “driving back home” may seem to an aspect of the world state, the fact of
the taxi’s destination is actually an aspect of the agent’s internal state. If you find this puzzling,
consider that the taxi could be in exactly the same place at the same time, but intending to reach
a different destination.

Goal-based agents:
A model-based, goal-based agent. It keeps track of the world state as well as a set of goals it
is trying to achieve, and chooses an action that will (eventually) lead to the achievement of
its goals.

The agent needs some sort of goal information that describes situations that are desirable —for
example, being at the passenger’s destination. The agent program can combine this with the
model to choose actions that achieve the goal. when goal satisfaction results immediately from a
single action. Sometimes it will be more tricky—for example, when the agent has to consider
long sequences of twists and turns in order to find a way to achieve the goal. Search and
planning are the subfields of AI devoted to finding action sequences that achieve the agent’s
goals. A goal-based agent, in principle, could reason that if the car in front has its brake lights on,
it will slow down. Given the way the world usually evolves, the only action that will achieve the
goal of not hitting other cars is to brake. the goal-based agent appears less efficient, it is more
Advanced AI and ML
Module 1- Intelligent Agents
flexible because the knowledge that supports its decisions is represented explicitly and can be
modified. If it starts to rain, the agent can update its knowledge of how effectively its brakes will
operate; this will automatically cause all of the relevant behaviors to be altered to suit the new
conditions. For the reflex agent, on the other hand, we would have to rewrite many condition–
action rules. The goal-based agent’s behavior can easily be changed to go to a different
destination, simply by specifying that destination as the goal. The reflex agent’s rules for when to
turn and when to go straight will work only for a single destination; they must all be replaced to
go somewhere new.

Utility-based agents:
Goals just provide a crude binary distinction between “happy” and “unhappy” states. A more
general performance measure should allow a comparison of different world states according to
exactly how happy they would make the agent. Because “happy” does not sound very scientific,
economists and computer scientists use the term utility instead. An agent’s utility function is
essentially an internalization of the performance measure. If the internal utility function and the
external performance measure are in agreement, then an agent that chooses actions to maximize
its utility will be rational according to the external performance measure. goal-based agents, a
utility-based agent has many advantages in terms of flexibility and learning. Furthermore, in two
kinds of cases, goals are inadequate but a utility-based agent can still make rational decisions.
First, when there are conflicting goals, only some of which can be achieved the utility function
specifies the appropriate tradeoff. Second, when there are several goals that the agent can aim for,
none of which can be achieved with certainty, utility provides a way in which the likelihood of
success can be weighed against the importance of the goals. Technically speaking, a rational utility-
based agent chooses the action that maximizes the expected utility of the action outcomes—that
is, the utility the agent expects to derive, on average, given the probabilities and utilities of each
outcome.
A model-based, utility-based agent. It uses a model of the world, along with a utility function
that measures its preferences among states of the world. Then it chooses the action that leads
to the best expected utility, where expected utility is computed by averaging over all possible
outcome states, weighted by the probability of the outcome.
Advanced AI and ML
Module 1- Intelligent Agents
Learning agents :
A general learning agent.

A learning agent can be divided into four conceptual components, as shown in Figure The most
important distinction is between the learning element, which is re sponsible for making
improvements, and the performance element, which is responsible for selecting external actions.
The performance element is what we have previously considered to be the entire agent: it takes in
percepts and decides on actions. The learning element uses feedback from the critic on how the
agent is doing and determines how the performance element should be modified to do better in the
future. The design of the learning element depends very much on the design of the performance
element. When trying to design an agent that learns a certain capability, the first question is not
“How am I going to get it to learn this?” but “What kind of performance element will my agent
need to do this once it has learned how?” Given an agent design, learning mechanisms can be
constructed to improve every part of the agent. The critic is necessary because the percepts
themselves provide no indication of the agent’s success. The last component of the learning agent
is the problem generator. It is responsible for suggesting actions that will lead to new and
informative experiences. The point is that if the performance element had its way, it would keep
doing the actions that are best, given what it knows. The external performance standard must
inform the agent that the loss of tips is a negative contribution to its overall performance; then the
agent might be able to learn that violent maneuvers do not contribute to its own utility. In a sense,
the performance standard distinguishes part of the incoming percept as a reward (or penalty) that
provides direct feedback on the quality of the agent’s behavior. agents have a variety of
components, and those components can be repre sented in many ways within the agent program,
so there appears to be great variety among learning methods. There is, however, a single unifying
theme. Learning in intelligent agents can be summarized as a process of modification of each
component of the agent to bring the components into closer agreement with the available feedback
information, thereby improving the overall performance of the agent.
Advanced AI and ML
Module 1- Intelligent Agents
How the components of agent programs work:
Three ways to represent states and the transitions between them. (a) Atomic representation:
a state (such as B or C) is a black box with no internal structure; (b) Factored representation:
a state consists of a vector of attribute values; values can be Boolean, real valued, or one of a
fixed set of symbols. (c) Structured representation: a state includes objects, each of which
may have attributes of its own as well as relationships to other objects.

“How on earth do these components work?” It takes about a thousand pages to begin to answer
that question properly, but here we want to draw the reader’s attention to some basic distinctions
among the various ways that the components can represent the environment that the agent inhabits.
Roughly speaking, we can place the representations along an axis of increasing complexity and
expressive power—atomic, factored, and structured. To illustrate these ideas, it helps to consider
a particular agent component, such as the one that deals with “What my actions do.” This
component describes the changes that might occur in the environment as the result of taking an
action. In an atomic representation each state of the world is indivisible it has no internal structure.
Consider the problem of finding a driving route from one end of a country to the other via some
sequence of cities. For the purposes of solving this problem, it may suffice to reduce the state of
world to just the name of the city we are in a single atom of knowledge; a “black box” whose only
discernible property is that of being identical to or different from another black box. The algorithms
Intelligent Agents underlying search and game-playing Hidden Markov models, and Markov
decision processes all work with atomic representation or, at least, they treat representations as if
they were atomic. Now consider a higher-fidelity description for the same problem, where we need
to be concerned with more than just atomic location in one city or another; we might need to pay
attention to how much gas is in the tank, our current GPS coordinates, whether or not the oil
warning light is working, how much spare change we have for toll crossings, what station is on
the radio, and so on. A factored representation splits up each state into a fixed set of variables or
attributes, each of which can have a value. While two different atomic states have nothing in
common—they are just different black boxes—two different factored states can share some
attributes and not others this makes it much easier to work out how to turn one state into another.
With factored representations, we can also represent uncertainty , for example, ignorance about
Advanced AI and ML
Module 1- Intelligent Agents
the amount of gas in the tank can be represented by leaving that attribute blank. Many important
areas of AI are based on factored representations, including constraint satisfaction algorithms,
propositional logic , planning , Bayesian networks , and the machine learning algorithms. For
example, we might notice that a large truck ahead of us is reversing into the driveway of a dairy
farm but a cow has got loose and is blocking the truck’s path. A factored representation is unlikely
to be pre-equipped with the attribute
Truck:AheadBackingIntoDairyFarmDrivewayBlockedByLooseCow with value true or false.
we would need a structured representation, in which objects such as cows and trucks and their
various and varying relationships can be described explicitly. Structured representations underlie
relational databases and first-order logic ,first-order probability models, knowledge-based learning
and much of natural language understanding . In fact, almost everything that humans express in
natural language concerns objects and their relationships. the axis along which atomic, factored,
and structured representations lie is the axis of increasing expressiveness. Roughly speaking, a
more expressive representation can capture, at least as concisely, everything a less expressive one
can capture, plus some more. Often, the more expressive language is much more concise; for
example, the rules of chess can be written in a page or two of a structured-representation language
such as first-order logic but require thousands of pages when written in a factored-representation
language such as propositional logic. On the other hand, reasoning and learning become more
complex as the expressive power of the representation increases. To gain the benefits of expressive
representations while avoiding their drawbacks, intelligent systems for the real world may need to
operate at all points along the axis simultaneously.

Problem solving: Game Paying


Game theory: Mathematical game theory, a branch of economics, views any multiagent
environment as a game, provided that the impact of each agent on the others is “significant,”
regardless of whether the agents are cooperative or competitive.

zero-sum games of perfect information: In AI, the most common games are of a
rather specialized kind, what game theorists call deterministic, turn-taking, two-player. this means
deterministic, fully observable environments in which two agents act alternately and in which the
utility values at the end of the game are always equal and opposite.

Pruning: Allows us to ignore portions of the search tree that make no difference to the final
choice, and heuristic evaluation functions allow us to approximate the true utility of a state without
doing a complete search.
Example : consider games with two players, whom we call MAX and MIN for reasons that will
soon become obvious. MAX moves first, and then they take turns moving until the game is over.
Advanced AI and ML
Module 1- Intelligent Agents
At the end of the game, points are awarded to the winning player and penalties are given to the
loser. A game can be formally defined as a kind of search problem with the following elements:
 S0: The initial state, which specifies how the game is set up at the start.
 PLAYER(s): Defines which player has the move in a state.
 ACTIONS(s): Returns the set of legal moves in a state.
 RESULT(s,a): The transition model, which defines the result of a move.
 TERMINAL-TEST(s): A terminal test, which is true when the game is over and false
otherwise. States where the game has ended are called terminal states.
 UTILITY(s,p):Autility function defines the final numeric value for a game that ends in
terminal states for a player p. In chess, the outcome is a win, loss, or draw, with values
+1,0, or 1/2. Some games have a wider variety of possible outcomes; the payoffs in
backgammon range from 0 to +192.
 A zero-sum game is (confusingly) defined as one where the total payoff to all players is the
same for every instance of the game. Chess is zero-sum because every game has payoff of
either 0+1, 1+0or 1/2 + 1/2.
 “Constant-sum” would have been a better term, but zero-sum is traditional and makes sense
if you imagine each player is charged an entry fee of 1 2. The initial state, ACTIONS
function, and RESULT function define the game tree for the game—a tree where the nodes
are game states and the edges are moves.
A (partial) game tree for the game of tic-tac-toe. The top node is the initial state, and
MAX moves first, placing an X in an empty square. We show part of the tree, giving
alternating moves by MIN (O)and MAX (X), until we eventually reach terminal states,
which can be assigned utilities according to the rules of the game.
Advanced AI and ML
Module 1- Intelligent Agents
From the initial state, MAX has nine possible moves. Play alternates between MAX’s placing
an X and MIN’s placing an o until we reach leaf nodes corresponding to terminal states such
that one player has three in a row or all the squares are filled. The number on each leaf node
indicates the utility value of the terminal state from the point of view of MAX; high values are
assumed to be good for MAX and bad for MIN (which is how the players get their names). For
tic-tac-toe the game tree is relatively small fewer than 9! = 362,880 terminal nodes. But for
chess there are over 1040 nodes, so the game tree is best thought of as a theoretical construct
that we cannot realize in the physical world. But regardless of the size of the game tree, it is
MAX’s job to search for a good move. We use the term search tree for a tree that is
superimposed on the full game tree, and examines enough nodes to allow a player to determine
what move to make.

Optimal decisions in Games:


A two-ply game tree. The nodes are “MAX nodes,” in which it is MAX’s turn to move, and
the nodes are “MIN nodes.” The terminal nodes show the utility values for MAX; the other
nodes are labeled with their minimax values. MAX’s best move at the root is a1, because it
leads to the state with the highest minimax value, and MIN’s best reply is b1, because it leads
to the state with the lowest minimax value.

The possible moves for MAX at the root node are labeled a1, a2, anda3. The possible replies
to a1 for MIN are b1, b2, b3, and so on. This particular game ends after one move each by
MAX and MIN. The utilities of the terminal states in this game range from 2 to 14. Given a
game tree, the optimal strategy can be determined from the minimax value of each node, which
we write as MINIMAX(n). The minimax value of a node is the utility (for MAX) of being in
the corresponding state, assuming that both players play optimally from there to the end of the
game. Obviously, the minimax value of a terminal state is just its utility. Furthermore, given a
choice, MAX prefers to move to a state of maximum value, whereas MIN prefers a state of
minimum value. So we have the following:
Advanced AI and ML
Module 1- Intelligent Agents
The terminal nodes on the bottom level get their utility values from the game’s UTILITY
function. The first MIN node, labeled B, has three successor states with values 3, 12, and 8, so
its minimax value is 3. Similarly, the other two MIN nodes have minimax value 2. The root
node is a MAX node; its successor states have minimax values 3, 2, and 2; so it has a minimax
value of 3. We can also identify the minimax decision at the root: action a1 is the optimal
choice for MAX because it leads to the state with the highest minimax value.

The minimax algorithm:

The minimax algorithm computes the minimax decision from the current state. It uses a simple
recursive computation of the minimax values of each successor state, directly implementing
the defining equations. The recursion proceeds all the way down to the leaves of the tree, and
then the minimax values are backed up through the tree as the recursion unwinds. The minimax
algorithm performs a complete depth-first exploration of the game tree. If the maximum depth
of the tree is m and there are b legal moves at each point, then the time complexity of the
minimax algorithm is O(b^m). The space complexity is O(bm) for an algorithm that generates
all actions at once, or O(m) for an algorithm that generates actions one at a time. For real
games, of course, the time cost is totally impractical, but this algorithm serves as the basis for
the mathematical analysis of games and for more practical algorithms.

Optimal decisions in multiplayer games:


First, we need to replace the single value for each node with a vector of values. For example,
in a three-player game with players A, B, and C, a vector vA, vB, vC is associated with each
node. For terminal states, this vector gives the utility of the state from each player’s viewpoint.
(In two-player, zero-sum games, the two-element vector can be reduced to a single value
because the values are always opposite.) The simplest way to implement this is to have the
UTILITY function return a vector of utilities. Now we have to consider nonterminal states.
Consider the node marked X in the game tree shown in Figure. In that state, player C chooses
Advanced AI and ML
Module 1- Intelligent Agents
what to do. The two choices lead to terminal states with utility vectors vA =1,vB =2,vC =6 and
vA =4,vB =2,vC =3 . Since 6 is bigger than 3, C should choose the first move. This means that
if state X is reached, subsequent play will lead to a terminal state with utilities vA =1,vB =2,vC
=6 . Hence, the backed-up value of X is this vector. The backed-up value of a node n is always
the utility vector of the successor state with the highest value for the player choosing at n.
Anyone who plays multiplayer games, such as Diplomacy, quickly becomes aware that much
more is going on than in two-player games. Multiplayer games usually involve alliances,
whether formal or informal, among the players. suppose A and B are in weak positions and C
is in a stronger position. Then it is often optimal for both A and B to attack C rather than each
other, lest C destroy each of them individually. In this way, collaboration emerges from purely
selfish behavior. Of course, as soon as C weakens under the joint onslaught, the alliance loses
its value, and either A or B could violate the agreement. In some cases, explicit alliances merely
make concrete what would have happened anyway. In other cases, a social stigma attaches to
breaking an alliance, so players must balance the immediate advantage of breaking an alliance
against the long-term disadvantage of being perceived as untrustworthy.

Alpha-Beta Purning :
When applied to a standard minimax tree, it returns the same move as minimax would, but
prunes away branches that cannot possibly influence the final decision. Consider again the
two-ply game tree from Figure Let’s go through the calculation of the optimal decision once
more, this time paying careful attention to what we know at each point in the process.
Advanced AI and ML
Module 1- Intelligent Agents

The outcome is that we can identify the minimax decision without ever evaluating two of the
leaf nodes. Another way to look at this is as a simplification of the formula for MINIMAX. Let
the two unevaluated successors of node C in Figure have values x and y. Then the value of the
root node is given by
MINIMAX(root)=max(min(3,12,8),min(2,x,y),min(14,5,2)) =max(3,min(2,x,y),2)
=max(3,z,2)
=3. where z =min(2,x,y) ≤ 2
The value of the root and hence the minimax decision are independent of the values of the
pruned leaves x and y. Alpha–beta pruning can be applied to trees of any depth, and it is often
possible to prune entire subtrees rather than just leaves. The general principle is this: consider
a node n somewhere in the tree such that Player has a choice of moving to that node. If Player
has a better choice m either at the parent node of n or at any choice point further up, then n will
never be reached in actual play. So once we have found out enough about n (by examining
some of its descendants) to reach this conclusion, we can prune it.

The general case for alpha–beta pruning. If m is better than n for Player, we
will never get to n in play.
Advanced AI and ML
Module 1- Intelligent Agents

α = the value of the best choice we have found so far at any choice point along the path for
MAX. β = the value of the best choice we have found so far at any choice point along the path
for MIN. Alpha–beta search updates the values of α and β as it goes along and prunes the
remaining branches at a node as soon as the value of the current node is known to be worse
than the current α or β value for MAX or MIN, respectively.
The alpha–beta search algorithm:

Move ordering:
The effectiveness of alpha–beta pruning is highly dependent on the order in which the states
are examined. For example, in Figure we could not prune any successors of D at all because
Advanced AI and ML
Module 1- Intelligent Agents
the worst successors were generated first. If the third successor of D had been generated first,
we would have been able to prune the other two. This suggests that it might be worthwhile to
try to examine first the successors that are likely to be best. If this can be done, then it turns
out that alpha–beta needs to examine only O(b^m/2) nodes to pick the best move, instead of
O(b^m) for minimax. This means that the effective branching factor becomes √b instead of
b—for chess, about 6 instead of 35. Put another way, alpha–beta can solve a tree roughly twice
as deep as minimax in the same amount of time. If successors are examined in random order
rather than best-first, the total number of nodes examined will be roughly O(b^3m/4) for
moderate b. For chess, a fairly simple ordering function gets you to within about a factor of 2
of the best-case O(b^m/2) result. One way to gain information from the current move is with
iterative deepening search. First, search 1 ply deep and record the best path of moves. Then
search 1 ply deeper, but use the recorded path to inform move ordering. As iterative deepening
on an exponential game tree adds only a constant fraction to the total search time, which can
be more than made up from better move ordering. The best moves are often called killer moves
and to try them first is called the killer move heuristic. In many games, repeated states occur
frequently because of transpositions—different permutations of the move sequence that end
up in the same position. For example, if White has one move, a1, that can be answered by
Black with b1 and an unrelated move a2 on the other side of the board that can be answered
by b2, then the sequences [a1,b1,a2,b2] and [a2,b2,a1,b1] both end up in the same position.
The hash table of previously seen positions is traditionally called a transposition table; it is
essentially identical to the explored list in GRAPH-SEARCH. Using a transposition table can
have a dramatic effect, sometimes as much as doubling the reachable search depth in chess.

Imperfect-Real Time Decision:


The suggestion is to alter minimax or alpha–beta in two ways: replace the utility function by a
heuristic evaluation function EVAL, which estimates the position’s utility, and replace the
terminal test by a cutoff test that decides when to apply EVAL. That gives us the following for
heuristic minimax for state s and maximum depth d:

Evaluation functions:
An evaluation function returns an estimate of the expected utility of the game from a given
position. First, the evaluation function should order the terminal states in the same way as the
true utility function: states that are wins must evaluate better than draws, which in turn must
be better than losses. Second, the computation must not take too long! Third, for nonterminal
states, the evaluation function should be strongly correlated with the actual chances of winning.
Advanced AI and ML
Module 1- Intelligent Agents
Most evaluation functions work by calculating various features of the state—for example, in
chess, we would have features for the number of white pawns, black pawns, white queens,
black queens, and so on. The features, taken together, define various categories or equivalence
classes of states: the states in each category have the same values for all the features. For
example, one category contains all two-pawn vs. one-pawn endgames. Any given category,
generally speaking, will contain some states that lead to wins, some that lead to draws, and
some that lead to losses. The evaluation function cannot know which states are which, but it
can return a single value that reflects the proportion of states with each outcome. For example,
suppose our experience suggests that 72% of the states encountered in the two-pawns vs. one-
pawn category lead to a win (utility +1); 20% to a loss (0), and 8% to a draw (1/2). Then a
reasonable evaluation for states in the category is the expected value: (0.72 × +1) + (0.20 × 0)
+ (0.08 × 1/2) = 0.76. In principle, the expected value can be determined for each category,
resulting in an evaluation function that works for any state. As with terminal states, the
evaluation function need not return actual expected values as long as the ordering of the states
is the same. most evaluation functions compute separate numerical contributions from each
feature and then combine them to find the total value. For example, introductory chess books
give an approximate material value for each piece: each pawn is worth 1, a knight or bishop is
worth 3, a rook 5, and the queen 9. Other features such as “good pawn structure” and “king
safety” might be worth half a pawn, say. These feature values are then simply added up to
obtain the evaluation of the position. A secure advantage equivalent to a pawn gives a
substantial likelihood of winning, and a secure advantage equivalent to three pawns should
give almost certain victory, as illustrated in Figure Mathematically, this kind of evaluation
function is called a weighted linear function because it can be expressed as:

where each wi is a weight and each fi is a feature of the position. For chess, the fi could be the
numbers of each kind of piece on the board, and the wi could be the values of the pieces.

Two chess positions that differ only in the position of the rook at lower right.
In (a), Black has an advantage of a knight and two pawns, which should be
enough to win the game. In (b), White will capture the queen, giving it an
advantage that should be strong enough to win.
Advanced AI and ML
Module 1- Intelligent Agents

current programs for chess and other games also use nonlinear combinations of features. For
example, a pair of bishops might be worth slightly more than twice the value of a single bishop,
and a bishop is worth more in the endgame (that is, when the move number feature is high or the
number of remaining pieces feature is low). The astute reader will have noticed that the features
and weights are not part of the rules of chess! They come from centuries of human chess-playing
experience. In games where this kind of experience is not available, the weights of the evaluation
function can be estimated by the machine learning techniques.

Cutting off search:


The next step is to modify ALPHA-BETA-SEARCH so that it will call the heuristic EVAL function
when it is appropriate to cut off the search. We replace the two lines in Figure that mention
TERMINAL-TEST with the following line:
if CUTOFF-TEST(state, depth) then return EVAL(state)
The most straightforward approach to controlling the amount of search is to set a fixed depth limit
so that CUTOFF-TEST(state, depth) returns true for all depth greater than some fixed depth d. (It
must also return true for all terminal states, just as TERMINAL-TEST did.) The depth d is chosen
so that a move is selected within the allocated time. A more robust approach is to apply iterative
deepening. a more sophisticated cutoff test is needed. The evaluation function should be applied
only to positions that are quiescent that is, unlikely to exhibit wild swings in value in the near
future. In chess, for example, positions in which favorable captures can be made are not quiescent
for an evaluation function that just counts material. Non quiescent positions can be expanded
further until quiescent positions are reached. This extra search is called a quiescence search;
sometimes it is restricted to consider only certain types of moves, such as capture moves, that will
quickly resolve the uncertainties in the position.
Advanced AI and ML
Module 1- Intelligent Agents

The horizon effect is more difficult to eliminate. It arises when the program is facing an opponent’s
move that causes serious damage and is ultimately unavoidable, but can be temporarily avoided
by delaying tactics. Consider the chess game in Figure . It is clear that there is no way for the black
bishop to escape. For example, the white rook can capture it by moving to h1, then a1, then a2; a
capture at depth 6 ply. But Black does have a sequence of moves that pushes the capture of the
bishop “over the horizon.” Suppose Black searches to depth 8 ply. Most moves by Black will lead
to the eventual capture of the bishop, and thus will be marked as “bad” moves. But Black will
consider checking the white king with the pawn at e4. This will lead to the king capturing the
pawn. Now Black will consider checking again, with the pawn at f5, leading to another pawn
capture. That takes up 4 ply, and from there the remaining 4 ply is not enough to capture the bishop.
Black thinks that the line of play has saved the bishop at the price of two pawns, when actually all
it has done is push the inevitable capture of the bishop beyond the horizon that Black can see. the
horizon effect is the singular extension, a move that is “clearly better” than all other moves in a
given position. Once discovered anywhere in the tree in the course of a search, this singular move
is remembered. When the search reaches the normal depth limit, the algorithm checks to see if the
singular extension is a legal move; if it is, the algorithm allows the move to be considered.

Forward pruning :
That some moves at a given node are pruned immediately without further consideration.

beam search: on each ply, consider only a “beam” of the n best moves rather than
considering all possible moves.

probabilistic cut: algorithm (Buro,1995) is a forward-pruning version of alph- beta


search that uses statistics gained from prior experience to lessen the chance that the best move will
be pruned. Alpha–beta search prunes any node that is provably out side the current (α,β)window.
PROBCUT also prunes nodes that are probably outside the window. It computes this probability
by doing a shallow search to compute the backed-up value of a node and then using past
experience to estimate how likely it is that as core of v at depth d in the tree would be
Advanced AI and ML
Module 1- Intelligent Agents
outside(α,β).Buro applied this technique to his Othello program, LOGISTELLO, and found that a
version of his program with PROBCUT be at the regular version 64% of the time, even when the
regular version was given twice as much time.

Search versus lookup:


computer analysis of endgames goes far beyond anything achieved by humans. A human can tell
you the gen eral strategy for playing a king-and-rook-versus-king (KRK) endgame: reduce the
opposing king’s mobility by squeezing it toward one edge of the board, using your king to prevent
the opponent from escaping the squeeze. Other endings, such as king, bishop, and knight versus
king (KBNK), are difficult to master and have no succinct strategy description. A computer, on the
other hand, can completely solve the endgame by producing a policy, which is a map ping from
every possible state to the best move in that state. Then we can just look up the best move rather
than recompute it anew. How big will the KBNK lookup table be? It turns out there are 462 ways
that two kings can be placed on the board without being adjacent. After the kings are placed, there
are 62 empty squares for the bishop, 61 for the knight, and two possible players to move next, so
there are just 462 × 62 × 61 × 2=3,494,568 possible positions. Some of these are checkmates; mark
them as such in a table. Then do a retrograde minimax search: reverse the rules of chess to do un
moves rather than moves. Any move by White that, no matter what move Black responds with,
ends up in a position marked as a win, must also be a win. Continue this search until all 3,494,568
positions are resolved as win, loss, or draw, and you have an infallible lookup table for all KBNK
endgames. Using this technique and a tour de force of optimization tricks, Ken Thompson (1986,
1996) and Lewis Stiller (1992, 1996) solved all chess endgames with up to five pieces and some
with six pieces, making them available on the Internet. Stiller discovered one case where a forced
mate existed but required 262 moves; this caused some consternation because the rules of chess
require a capture or pawn move to occur within 50 moves. Later work by Marc Bourzutschky and
Yakov Konoval (Bourzutschky, 2006) solved all pawnless six-piece and some seven-piece
endgames; there is a KQNKRBN endgame that with best play requires 517 moves until a capture,
which then leads to a mate. If we could extend the chess endgame tables from 6 pieces to 32, then
White would know on the opening move whether it would be a win, loss, or draw.

Stochastic Games:
Many games mirror this unpredictability by including a random element, such as the throwing of
dice. Backgammon is a typical game that combines luck and skill. Dice are rolled at the beginning
of a player’s turn to determine the legal moves. In the backgammon position of Figure White has
rolled a 6–5 and has four possible moves. That means White cannot construct a standard game tree
of the sort we saw in chess and tic-tac-toe. A game tree in backgammon must include chance nodes
in addition to MAX and MIN nodes. Chance nodes are shown as circles in Figure The branches
leading from each chance node denote the possible dice rolls; each branch is labeled with the roll
and its probability. There are 36 ways to roll two dice, each equally likely; but because a 6–5 is the
Advanced AI and ML
Module 1- Intelligent Agents
same as a 5–6, there are only 21 distinct rolls. The six doubles (1–1 through 6–6) each have a
probability of 1/36, so we say P(1–1)=1/36. The other 15 distinct rolls each have a 1/18 probability.

positions do not have definite minimax values. Instead, we can only calculate the expected value
of a position: the average over all possible outcomes of the chance node.This leads us to generalize
the minimax value for deterministic games to an expecti-minimax value for games with chance
nodes. Terminal nodes and MAX and MIN nodes (for which the dice roll is known) work exactly
Advanced AI and ML
Module 1- Intelligent Agents
the same way as before. For chance nodes we compute the expected value, which is the sum of the
value over all outcomes, weighted by the probability of each chance action:

where r represents a possible dice roll (or other chance event) and RESULT(s,r) is the same state
as s, with the additional fact that the result of the dice roll is r.

Evaluation functions for games of chance:


One might think that evaluation functions for games such as backgammon should be just like
evaluation functions for chess—they just need to give higher scores to better positions. But in fact,
the presence of chance nodes means that one has to be more careful about what the evaluation
values mean. Figure 5.12 shows what happens: with an evaluation function that assigns the values
[1, 2, 3, 4] to the leaves, move a1 is best; with values [1, 20, 30, 400], move a2 is best. Hence, the
program behaves totally differently if we make a change in the scale of some evaluation values! It
turns out that to avoid this sensitivity, the evaluation function must be a positive linear
transformation of the probability of winning from a position.

If the program knew in advance all the dice rolls that would occur for the rest of the game, solving
a game with dice would be just like solving a game without dice, which mini max does in O(b^m)
time, where b is the branching factor and m is the maximum depth of the game tree. Because
expecti-minimax is also considering all the possible dice-roll sequences, it will take
O(b^mn^m),where n is the number of distinct rolls. Even if the search depth is limited to some
small depth d, the extra cost compared with that of minimax makes it unrealistic to consider
Advanced AI and ML
Module 1- Intelligent Agents
looking ahead very far in most games of chance. In backgammon n is 21 and b is usually around
20, but in some situations can be as high as 4000 for dice rolls that are doubles. Three plies is
probably all we could manage. The analysis for MIN and MAX nodes is unchanged, but we can
also prune chance nodes, using a bit of ingenuity. An alternative is to do Monte Carlo simulation
to evaluate a position. Start with an alpha–beta (or other) search algorithm. From a start position,
have the algorithm play thousands of games against itself, using random dice rolls. In the case of
backgammon, the resulting win percentage has been shown to be a good approximation of the
value of the position, even if the algorithm has an imperfect heuristic and is searching only a few
plies (Tesauro, 1995). For games with dice, this type of simulation is called a rollout.

Partially observable Games:


The use of scouts and spies to gather information and the use of concealment and bluff to confuse
the enemy. Partially observable games share these characteristics and are thus qualitatively
different from the games described in the preceding sections.

Kriegspiel: Partially observable chess:


The rules of Kriegspiel are as follows: White and Black each see a board containing only their own
pieces. A referee, who can see all the pieces, adjudicates the game and periodically makes
announcements that are heard by both players. On his turn, White proposes to the referee any move
that would be legal if there were no black pieces. If the move is in fact not legal (because of the
black pieces), the referee announces “illegal.” In this case, White may keep proposing moves until
a legal one is found—and learns more about the location of Black’s pieces in the process. Once a
legal move is proposed, the referee announces one or more of the following: “Capture on square
X” if there is a capture, and “Check by D” if the black king is in check, where D is the direction
of the check, and can be one of “Knight,” “Rank,” “File,” “Long diagonal,” or “Short diagonal.”
(In case of discovered check, the referee may make two “Check” announcements.) If Black is
checkmated or stalemated, the referee says so; otherwise, it is Black’s turn to move. After White
makes a move and Black responds, White’s belief state contains 20 positions because Black has
20 replies to any White move. Keeping track of the belief state as the game progresses is exactly
the problem of state estimation, for which the update step is given in We can map Kriegspiel state
estimation directly onto the partially observable, nondeterministic framework of if we consider the
opponent as the source of nondeterminism; that is, the RESULTS of White’s move are composed
from the (predictable) outcome of White’s own move and the unpredictable outcome given by
Black’s reply. a move to make for each possible move the opponent might make, we need a move
for every possible percept sequence that might be received. For Kriegspiel, a winning strategy, or
guaranteed check mate, is one that, for each possible percept sequence, leads to an actual
checkmate for every possible board state in the current belief state, regardless of how the opponent
moves. With this definition, the opponent’s belief state is irrelevant—the strategy has to work even
if the opponent can see all the pieces. This greatly simplifies the computation. Figure 5.13 shows
part of a guaranteed checkmate for the KRK (king and rook against king) endgame. In this case,
Black has just one piece (the king), so a belief state for White can be shown in a single board by
Advanced AI and ML
Module 1- Intelligent Agents
marking each possible position of the Black king. Kriegspiel admits an entirely new concept that
makes no sense in fully observable games: probabilistic checkmate. Such checkmates are still
required to work in every board state in the belief state; they are probabilistic with respect to
randomization of the winning player’s moves.

It is quite rare that a guaranteed or probabilistic checkmate can be found within any reasonable
depth, except in the endgame. Sometimes a checkmate strategy works for some of the board states
in the current belief state but not others. Trying such a strategy may succeed, leading to an
accidental checkmate—accidental in the sense that White could not know that it would be
checkmate—if Black’s pieces happen to be in the right places. (Most checkmates in games
between humans are of this accidental nature.) This idea leads naturally to the question of how
likely it is that a given strategy will win, which leads in turn to the question of how likely it is that
each board state in the current belief state is the true board state.

Card Games:
Card games provide many examples of stochastic partial observability, where the missing
information is generated randomly. For example, in many games, cards are dealt randomly at the
beginning of the game, with each player receiving a hand that is not visible to the other players.
Such games include bridge, whist, hearts, and some forms of poker. At first sight, it might seem
that these card games are just like dice games: the cards are dealt randomly and determine the
moves available to each player, but all the “dice” are rolled at the beginning! Even though this
Advanced AI and ML
Module 1- Intelligent Agents
analogy turns out to be incorrect, it suggests an effective algorithm: consider all possible deals of
the invisible cards; solve each one as if it were a fully observable game; and then choose the move
that has the best outcome averaged over all the deals. Suppose that each deal s occurs with
probability P(s); then the move we want is:

Here, we run exact MINIMAX if computationally feasible; otherwise, we run H-MINIMAX. Now,
in most card games, the number of possible deals is rather large. For example, in bridge play, each
player sees just two of the four hands; there are two unseen hands of 13 cards each, so the number
of deals is 26 13 =10,400,600. Solving even one deal is quite difficult, so solving ten million is out
of the question. Instead, we resort to a Monte Carlo Adversarial Search approximation: instead of
adding up all the deals, we take a random sample of N deals, where the probability of deal s
appearing in the sample is proportional to P(s):

As N grows large, the sum over the random sample tends to the exact value, but even for fairly
small N—say, 100 to 1,000—the method gives a good approximation. It can also be applied to
deterministic games such as Kriegspiel, given some reasonable estimate of P(s).

You might also like