Artificial Intelligence
Artificial Intelligence
Foundations of AI:
What is AI, History of AI,
Strong and weak AI,
The State of the Art.
Intelligent Agents: 1 - 14
Agents and Environments,
Good Behavior:
The Concept of Rationality,
The Nature of Environments,
The Structure of Agents.
UNIT-II
Solving Problems by Searching:
Problem Solving Agents,
Example Problems, 15 - 33
Searching for Solutions,
Uninformed Search Strategies,
Informed (Heuristic) Search Strategies,
Heuristic Functions.
Artificial Intelligence 1
UNIT- I
FOUNDATIONS OF AI
WHAT ISAI?
Artificial Intelligence (AI) is a branch of Science which deals with helping machines
finding solutions to complex problems in a more human-like fashion. This generally involves
borrowing characteristics from human intelligence, and applying them as algorithms in a
computer friendly way. A more or less flexible or efficient approach can be taken depending
on the requirements established, which influences how artificial the intelligent behaviour
appears. AI is generally associated with Computer Science, but it has many important links
with other fields such as Maths, Psychology, Cognition, Biology and Philosophy, among many
others. Our ability to combine knowledge from all these fields will ultimately benefit our
progress in the quest of creating an intelligent artificial being.
AI currently encompasses a huge variety of subfields, from general-purpose areas such
as perception and logical reasoning, to specific tasks such as playing chess, proving
mathematical theorems, writing poetry, and diagnosing diseases. Often, scientists in other fields
move gradually into artificial intelligence, where they find the tools and vocabulary to
systematize and automate the intellectual tasks on which they have been working all their lives.
Similarly, workers in AI can choose to apply their methods to any area of human intellectual
endeavour. In this sense, it is truly a universal field.
HISTORY OF AI
The origin of artificial intelligence lies in the earliest days of machine computations.
During the 1940s and 1950s, AI begins to grow with the emergence of the modem computer.
Among the first researchers to attempt to build intelligent programs were Newell and Simon.
Their first well known program, logic theorist, was a program that proved statements using the
accepted rules of logic and a problem-solving program of their own design. By the late fifties,
programs existed that could do a passable job of translating technical documents and it was seen
as only a matter of extra databases and more computing power to apply the techniques to less
formal, more ambiguous texts. Most problem-solving work revolved around the work of
Newell, Shaw and Simon, on the general problem solver (GPS). Unfortunately, the GPS did
not fulfil its promise and did not because of some simple lack of computing capacity. In the
1970's the most important concept of AI was developed known as Expert System which exhibits
as a set rules the knowledge of an expert. The application area of expert system is very large.
The 1980's saw the development of neural networks as a method learning examples.
Prof. Peter Jackson (University of Edinburgh) classified the history of AI into three
periods as:
1. Classical
2. Romantic
3. Modem
Artificial Intelligence 2
1. Classical Period:
It was started from 1950. In 1956, the concept of Artificial Intelligence came into existence.
During this period, the main research work carried out includes game plying, theorem proving
and concept of state space approach for solving a problem.
2. Romantic Period:
It was started from the mid 1960 and continues until the mid 1970. During this period people
were interested in making machine understand, that is usually mean the understanding of
natural language. During this period the knowledge representation technique "semantic net"
was developed.
3. Modern Period:
It was started from 1970 and continues to the present day. This period was developed to solve
more complex problems. This period includes the research on both theories and practical
aspects of Artificial Intelligence. This period includes the birth of concepts like Expert system,
Artificial Neurons, Pattern Recognition etc. The research of the various advanced concepts of
Pattern Recognition and Neural Network are still going on.
Components of AI
There are three types of components in AI
1) Hardware Components of AI
a) Pattern Matching
b) Logic Representation
c) Symbolic Processing
d) Numeric Processing
e) Problem Solving
f) Heuristic Search
g) Natural Language processing
h) Knowledge Representation
i) Expert System
j) Neural Network
k) Leaming
1) Planning
m) Semantic Network
2) Software Components
a) Machine Language
b) Assembly language
c) High level Language
d) LISP Language
e) Fourth generation Language
f) Object Oriented Language
g) Distributed Language
h) Natural Language
i) Particular Problem Solving Language
Artificial Intelligence 3
3) Architectural Components
a) Uniprocessor
b) Multiprocessor
c) Special Purpose Processor
d) Array Processor
e) Vector Processor
f) Parallel Processor
g) Distributed Processor
2. AI is a field of study that encompasses computational techniques for performing tasks that
apparently require intelligence when performed by humans.
3. AI is the branch of computer science that is concerned with the automation of intelligent
behaviour. A I is based upon the principles of computer science namely data structures used in
knowledge representation, the algorithms needed to apply that knowledge and the languages
and programming techniques used in their implementation.
4. AI is the field of study that seeks to explain and emulate intelligent behaviour in terms of
computational processes.
6. A I is the part of computer science concerned with designing intelligent computer systems,
that is, computer systems that exhibit the characteristics we associate with intelligence in
human behaviour such as understanding language, learning, reasoning and solving problems.
8. A I is the study of the computations that make it possible to perceive, reason, and act.
9. A I is the exciting new effort to make computers think machines with minds, in the full and
literal sense.
10. AI is concerned with developing computer systems that can store knowledge and
effectively use the knowledge to help solve problems and accomplish tasks. This brief
statement sounds a lot like one of the commonly accepted goals in the education of humans.
We want students to learn (gain knowledge) and to learn to use this knowledge to help solve
problems and accomplish tasks.
Artificial Intelligence 4
THESTATEOFTHEART
The increasingly advanced technology provides researchers with new tools that are capable of
achieving important goals, and these tools are great starting points in and of themselves. Among
the achievements of recent years, the following are some specific domains:
• Machine learning;
• Reinforcement learning;
• Deep learning;
• Natural language processing.
Deep Learning
Also within ML, deep learning takes inspiration from the activity of neurons within the brain
to learn how to recognize complex patterns through learned data. This is thanks to the use of
algorithms, mainly statistical calculations. The word 'deep' refers to the large number of
Artificial Intelligence 5
levels of neurons that ML models simultaneously, which helps acquire rich representations of
data to obtain performance gains.
Natural Language Processing (NLP)
Natural language processing is the mechanism by which machines acquire the ability to
analyze, understand, and manipulate textual data. 2019 was a great year for NPL with Google
Al's BERT and Transformer, Allen Institute's ELMo, OpenAI's Transformer, Ruder and
Howard's ULMFit, and finally, Microsoft's MT-DNN. All of these have shown that pre- taught
language models can substantially improve performance on a wide variety of NLP tasks.
INTELLIGENT AGENTS
AGENTS
An AI system is composed of an agent and its environment. The agents act m their
environment. The environment may contain other agents.
An agent is anything that can perceive its environment through sensors and acts upon that
environment through effectors.
• A human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel
to the sensors, and other organs such as hands, legs, mouth, for effectors.
• A robotic agent replaces cameras and infrared range finders for the sensors, and
various motors and actuators for effectors.
• A software agent has encoded bit strings as its programs and actions.
Figure 2.1 Agents interact with environments through sensors and effectors.
Agent Terminology
• Performance Measure of Agent - It is the criteria, which determines how
successful an agent is.
• Behavior of Agent - It is the action that agent performs after any given sequence of
percepts.
• Percept - It is agent's perceptual inputs at a given instance.
• Percept Sequence - It is the history of all that an agent has perceived till date.
• Agent Function - It is a map from the precept sequence to an action.
Artificial Intelligence 6
This leads to a definition of an ideal rational agent: For each possible percept sequence, an
ideal rational agent should do whatever action is expected to maximize its performance
measure, on the basis of the evidence provided by the percept sequence and whatever built-in
knowledge the agent has.
Artificial Intelligence 7
ENVIRONMENTS
In this section, we will see how to couple an agent to an environment. In all cases,
however, the nature of the connection between them is the same: actions are done by the agent
on the environment, which in tum provides percepts to the agent. First, we will describe the
different types of environments and how they affect the design of agents. Then we will describe
environment programs that can be used as testbeds for agent programs.
Properties of environments
Environments come in several flavors. The principal distinctions to be made are as
follows:
Environment programs
The generic environment program in Figure 2.14 illustrates the basic relationship
between agents and environments. In this book, we will find it convenient for many of the
examples and exercises to use an environment simulator that follows this program structure.
The simulator takes one or more agents as input and arranges to repeatedly give each agent the
right percepts and receive back an action. The simulator then updates the environment based
on the actions, and possibly other dynamic processes in the environment that are not considered
to be agents (rain, for example). The environment is therefore defined by the initial state and
the update function. Of course, an agent that works in a simulator ought also to work in a real
environment that provides the same kinds of percepts and accepts the same kinds of actions.
Artificial Intelligence 9
Turing Test
• The success of an intelligent behavior of a system can be measured with Turing Test.
• Two persons and a machine to be evaluated participate in the test. Out of the two
persons, one plays the role of the tester. Each of them sits in different rooms. The
tester is unaware of who is machine and who is a human. He interrogates the questions
by typing and sending them to both intelligences, to which he receives typed
responses.
• This test aims at fooling the tester. If the tester fails to determine machine's response
from the human response, then the machine is said to be intelligent.
So far, we have talked about agents by describing their behavior-the action that is
performed after any given sequence of percepts. Now, we will have to bite the bullet and talk
about how the insides work.
The job of AI is to design the agent program: a function that implements the agent
mapping from percepts to actions. We assume this program will run on some sort of computing
device, which we will call the architecture. Obviously, the program we choose has to be one
that the architecture will accept and run. The architecture might be a plain computer, or it
might include special-purpose hardware for certain tasks, such as processing camera images or
filtering audio input. It might also include software that provides a degree of insulation between
the raw computer and the agent program, so that we can program at a higher level. In general,
the architecture makes the percepts from the sensors available to the program, runs the program,
and feeds the program's action choices to the effectors as they are generated.
The relationship among agents, architectures, and programs can be summed up as
follows:
agent = architecture + program
Before we design an agent program, we must have a pretty good idea of the possible
percepts and actions, what goals or performance measure the agent is supposed to achieve, and
what sort of environment it will operate in. These come in a wide variety. Figure 2.3 shows the
basic elements for a selection of agent types.
It may come as a surprise to some readers that we include in our list of agent types
programs that seem to operate in the entirely artificial environment defined by keyboard input
and character output on a screen. "Surely," one might say, "this is not a real environment, is
it?" In fact, what matters is not the distinction between "real" and "artificial" environments, but
the complexity of the relationship among the behavior of the agent, the percept sequence
generated by the environment, and the goals that the agent is supposed to achieve. Some "real"
environments are actually quite simple. For example, a robot designed to inspect parts as they
come by on a conveyer belt can make use of a number of simplifying assumptions: that the
lighting is always just so, that the only thing on the conveyer belt will be parts of a certain kind,
and that there are only two actions-accept the part or mark it as a reject.
Artificial Intelligence 11
Part-picking robot Pixels of varying Pick up parts and Place parts in Conveyor belt
intensity sort into bins correct bins with parts
Agent Programs
Intelligent Agents will all have the same skeleton, namely, accepting percepts from an
environment and generating actions. The early versions of agent programs will have a very
simple form (Figure 2.4). Each will use some internal data structures that will be updated as
new percepts arrive. These data structures are operated on by the agent's decision-making
procedures to generate an action choice, which is then passed to the architecture to be executed.
There are two things to note about this skeleton program. First, even though we defined
the agent mapping as a function from percept sequences to actions, the agent program receives
only a single percept as its input. It is up to the agent to build up the percept sequence in
memory, if it so desires. In some environments, it is possible to be quite successful without
storing the percept sequence, and in complex domains, it is infeasible to store the complete
sequence.
Figure 2.4 A skeleton agent. On each invocation, the agent's memory is updated to reflect
the new percept, the best action is chosen, and the fact that the action was taken is also stored in
memory. The memory persists from one invocation to the next.
Second, the goal or performance measure is not part of the skeleton program. This is
because the performance measure is applied externally to judge the behavior of the agent, and
Artificial Intelligence 12
Example
At this point, it will be helpful to consider a particular environment, so that our
discussion can become more concrete. Mainly because of its familiarity, and because it
involves a broad range of skills, we will look at the job of designing an automated taxi driver.
We must first think about the percepts, actions, goals and environment for the taxi.
They are summarized in Figure 2.6 and discussed in tum.
Agent Type Percepts Actions Goals Environment
Taxi driver Cameras, Steer, accelerate, Safe, fast, legal, Roads, other
speedometer, GPS, brake, talk. to comfortable trip, traffic, pedestrians,
sonar, microphone passenger maximize profits customers
The taxi will need to know where it is, what else is on the road, and how fast it is gomg.
This information can be obtained from the percepts provided by one or more controllable TV
cameras, the speedometer, and odometer. To control the vehicle properly, especially on curves,
it should have an accelerometer; it will also need to know the mechanical state of the vehicle,
so it will need the usual array of engine and electrical system sensors. It might have instruments
that are not available to the average human driver: a satellite global positioning system (GPS)
to give it accurate position information with respect to an electronic map; or infrared or sonar
sensors to detect distances to other cars and obstacles. Finally, it will need a microphone or
keyboard for the passengers to tell it their destination.
The actions available to a taxi driver will be more or less the same ones available to a
human driver: control over the engine through the gas pedal and control over steering and
braking. In addition, it will need output to a screen or voice synthesizer to talk back to the
passengers, and perhaps some way to communicate with other vehicles.
What performance measure would we like our automated driver to aspire to? Desirable
qualities include getting to the correct destination; minimizing fuel consumption and wear and
tear; minimizing the trip time and/or cost; minimizing violations of traffic laws and
disturbances to other drivers; maximizing safety and passenger comfort; maximizing profits.
Obviously, some of these goals conflict, so there will be trade-offs involved.
Finally, were this a real project, we would need to decide what kind of driving
environment the taxi will face. Should it operate on local roads, or also on freeways? Will it
be in Southern California, where snow is seldom a problem, or in Alaska, where it seldom is
not? Will it always be driving on the right, or might we want it to be flexible enough to drive
on the left in case we want to operate taxis in Britain or Japan? Obviously, the more restricted
the environment, the easier the design problem.
Artificial Intelligence 13
Sensors
Agent
C:
How is the world
like now?
Effectors
E
C: How world evolves
C:
LU
What actions I Condition-Action
need todo? Rule
Effectors
Artificial Intelligence 14
Sensors
What my actions do
What my actions do
What happens if I do action A
Effectors
*******
Artificial Intelligence 15
UNIT- II
SOLVING PROBLEMS BY SEARCHING
In previous chapter, we saw that simple reflex agents are unable to plan ahead. They
are limited in what they can do because their actions are determined only by the current percept.
Furthermore, they have no knowledge of what their actions do nor of what they are trying to
achieve.
In this chapter, we describe one kind of goal-based agent called a problem-solving
agent. Problem-solving agents decide what to do by finding sequences of actions that lead to
desirable states. We discuss informally how the agent can formulate an appropriate view of the
problem it faces. The problem type that results from the formulation process will depend on
the knowledge available to the agent: principally, whether it knows the current state and the
outcomes of actions. We then define more precisely the elements that constitute a "problem"
and its "solution," and give several examples to illustrate these definitions. Given precise
definitions of problems, it is relatively straightforward to construct a search process for finding
solutions.
EXAMPLE PROBLEMS
The range of task environments that can be characterized by well-defined problems is
vast. We can distinguish between so-called, toy problems, which are intended to illustrate or
exercise various problem-solving methods, and so-called real-world problems, which tend to
be more difficult and whose solutions people actually care about. In this section, we will give
examples of both. By nature, toy problems can be given a concise, exact description. This
means that they can be easily used by different researchers to compare the performance of
algorithms. Real-world problems, on the other hand, tend not to have a single agreed-upon
description, but we will attempt to give the general flavor of their formulations.
1. Toy Problems
The 8-puzz/e
The 8-puzzle, an instance of which is shown in Figure 3.4, consists of a 3x3 board with
eight numbered tiles and a blank space. A tile adjacent to the blank space can slide into the
space. The object is to reach the configuration shown on the right of the figure. One important
trick is to notice that rather than use operators such as "move the 3 tile into the blank space," it
is more sensible to have operators such as "the blank space changes places with the tile to its
left." This is because there are fewer of the latter kind of operator.
This leads us to the following formulation:
• States: a state description specifies the location of each of the eight tiles in one of the
nine squares. For efficiency, it is useful to include the location of the blank.
• Operators: blank moves left, right, up, or down.
• Goal test: state matches the goal configuration shown in Figure 3.4.
• Path cost: each step costs 1, so the path cost is just the length of the path.
The 8-puzzle belongs to the family of sliding-block puzzles. This general class is known to
be NP-complete, so one does not expect to find methods significantly better than the search
algorithms described in this chapter and the next. The 8-puzzle and its larger cousin, the 15-
puzzle, are the standard test problems for new search algorithms in Al.
'
;
. ... \
', 3 :
r r,-',
·:
:: 4
.;
.
r
•• I • "'
, " I. . ;,; • ,,_- •
MI/J
1 7 6 :1: 5 ::
(!. _{ ,,,. .. : •, ..., .. '. : ...:.. 'i ' .. !':! ., :, ·- --- .. : "/
The 8-queens problem can be defined as follows: Place 8 queens on an (8 by 8) chess board
such that none of the queens attacks any of the others. A configuration of 8 queens on the board
is shown in figure 1, but this does not represent a solution as the queen in the first column is on
the same diagonal as the queen in the last column.
)t
)t
-
)t
)t
)t
)t
)t
)t
Figure 1: Almost a solution of the 8-queens problem
Although efficient special-purpose algorithms exist for this problem and the whole n
queens family, it remains an interesting test problem for search algorithms. There are two main
kinds of formulation. The incremental formulation involves placing queens one by one,
whereas the complete-state formulation starts with all 8 queens on the board and moves them
around. In either case, the path cost is of no interest because only the final state counts;
algorithms are thus compared only on search cost. Thus, we have the following goal test and
path cost:
• Goal test: 8 queens on board, none attacked.
• Path cost: zero.
There are also different possible states and operators. Consider the following simple-minded
formulation:
• States: any arrangement of O to 8 queens on board.
• Operators: add a queen to any square.
In this formulation, we have 648 possible sequences to investigate. A more sensible choice
would use the fact that placing a queen where it is already attacked cannot work, because
subsequent placings of other queens will not undo the attack. So we might try the following:
• States: arrangements of O to 8 queens with none attacked.
• Operators: place a queen in the left-most empty column such that it is not attacked by
any other queen.
It is easy to see that the actions given can generate only states with no attacks; but
sometimes no actions will be possible. For example, after making the first seven choices (left-
to-right) in Figure 1, there is no action available in this formulation. The search process must
try another choice. A quick calculation shows that there are only 2057 possible sequences to
investigate. The right formulation makes a big difference to the size of the search space. Similar
considerations apply for a complete-state formulation. For example, we could set the problem
up as follows:
• States: arrangements of 8 queens, one in each column.
• Operators: move any attacked queen to another square in the same column.
This formulation would allow the algorithm to find a solution eventually, but it would be better
to move to an unattacked square if possible.
Artificial Intelligence 18
2. Real-world problems
Route finding
We have already seen how route finding is defined in terms of specified locations and
transitions! along links between them. Route-finding algorithms are used in a variety of
applications, such! as routing in computer networks, automated travel advisory systems, and
airline travel planning! systems. The last application is somewhat more complicated, because
airline travel has a very complex path cost, in terms of money, seat quality, time of day, type
of airplane, frequent-flyer mileage awards, and so on. Furthermore, the actions in the problem
do not have completely known outcomes: flights can be late or overbooked, connections can
be missed, and fog or emergency maintenance can cause delays.
VLSI Layout
The design of silicon chips is one of the most complex engineering design tasks
currently undertaken, and we can give only a brief sketch here. A typical VLSI chip can have
as many as a million gates, and the positioning and connections of every gate are crucial to the
successful operation of the chip. Computer-aided design tools are used in every phase of the
process. Two of the most difficult tasks are cell layout and channel routing. These come after
the components and connections of the circuit have been fixed; the purpose is to lay out the
circuit on the chip so as to minimize area and connection lengths, thereby maximizing speed.
In cell layout, the primitive components of the circuit are grouped into cells, each of which
performs some recognized function. Each cell has a fixed footprint (size and shape) and
requires a certain number of connections to each of the other cells. The aim is to place the cells
on the chip so that they do not overlap and so that there is room for the connecting wires to be
placed between the cells. Channel routing finds a specific route for each wire using the gaps
between the cells. These search problems are extremely complex, but definitely worth solving.
Robot navigation
Robot navigation is a generalization of the route-finding problem described earlier.
Rather than a discrete set of routes, a robot can move in a continuous space with (in principle)
an infinite set of possible actions and states. For a simple, circular robot moving on a flat
Artificial Intelligence 19
surface, the space is essentially two-dimensional. When the robot has arms and legs that must
also be controlled, the search space becomes many-dimensional. Advanced techniques are
required just to make the search space finite.
SEARCH STRATEGIES
Search Algorithm Terminologies
Search:
Searching is a step by step procedure to solve a search-problem in a given search space.
A search problem can have three main factors:
1. Search Space: Search space represents a set of possible solutions, which a
system may have.
2. Start State: It is a state from where agent begins the search.
3. Goal test: It is a function which observe the current state and returns whether
the goal state is achieved or not.
• Search tree: A tree representation of search problem is called Search tree. The
root of the search tree is the root node which is corresponding to the initial state.
• Actions: It gives the description of all the available actions to the agent.
• Transition model: A description of what each action do, can be represented as a
transition model.
• Path Cost: It is a function which assigns a numeric cost to each path.
• Solution: It is an action sequence which leads from the start node to the goal
node.
• Optimal Solution: If a solution has the lowest cost among all solutions.
Artificial Intelligence 21
Bidirectional search
1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first
search.
o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search implemented using FIFO queue data structure.
Advantages:
o BFS will provide a solution if any solution exists.
o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.
Disadvantages:
o t requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.
Example
In the below tree structure, we have shown the traversing of the tree using BFS algorithm
from the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow
the path which is shown by the dotted arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->1 >K
-----+• I vlo
>/
►
B ------4►►• Lev 11
...
C
/ D ►
/G II ► Lev 12
/\ • \
E F ... I
..
--•► Lcvel4
Time Complexity:
Time Complexity of BFS algorithm can be obtained by the number of nodes traversed in BFS
until the shallowest Node. Where the d= depth of shallowest solution and bis a node at every
state.
T (b) = 1+b2+b3+.......+ bd=0 (bd)
Artificial Intelligence 23
Space Complexity:
Space complexity of BFS algorithm is given by the Memory size of frontier which is O(bd).
Completeness:
BFS is complete, which means if the shallowest goal node is at some finite depth, then BFS
will find a solution.
Optimality:
BFS is optimal if path cost is a non-decreasing function of the depth of the node.
2. Depth-first Search
o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.
Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the
path from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).
Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee
of finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.
Example
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
Root node--->Left node ---- > right node.
Depth Fir t Search
Lev lo
1/ /
H
""
J
--+ Lev l t
. Level 2
\
K --+ Lev Is
Artificial Intelligence 24
It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal
node.
Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +......... + nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity ofDFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.
Advantages:
Depth-limited search is Memory efficient.
Disadvantages:
o Depth-limited search also has a disadvantage of incompleteness.
o It may not be optimal if the problem has more than one solution.
Example
------+ Le el o
-... Le 11
. Level 2
I\ \ \
E F G H . Level S
Artificial Intelligence 25
Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Time Complexity: Time complexity of DLS algorithm is O(br).
Space Complexity: Space complexity of DLS algorithm is O(bxf).
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not
optimal even if t>d.
Advantages:
o Uniform cost search is optimal because at every state the path with the least cost is
chosen.
0
Disadvantages:
o It does not care about the number of steps involve in searching and only concerned
about path cost. Due to which this algorithm may be stuck in an infinite loop.
Example
Uniform Cost S arch
.,,,,. LeYel o
/
( 1/ \4
B Lev l I
--...,
\
1/ ,'
D G LeYel 2
I
E F Lev 1 S
sl
G LeYel
Artificial Intelligence 26
Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and t is each step to get closer to the goal node. Then
the number of steps is = C*Ii:::+1. Here we have taken +1, as we start from state O and end to C*/c.
Hence, the worst-case time complexity of Uniform-cost search isO(b1+ IC*lt:1)/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1+ IC*lt:l).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.
Advantages:
o It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.
Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.
Example
Following tree structure is showing the iterative deepening depth-first search.
IDDFS algorithm performs various iterations until it does not find the goal node.
The iteration performed by the algorithm is given as:
Iterative deepening depth first search
A -----+ Level o
B
I \ C -----+ Level 1
/\ /\
D E F G -----+ Level2
/\ \
H I K -----+ Level 3
Artificial Intelligence 27
Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the
node.
Advantages:
o Bidirectional search is fast.
o Bidirectional search requires less memory
Disadvantages:
o Implementation of the bidirectional search tree is difficult.
o In bidirectional search, one should know the goal state in advance.
Example
In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and
starts from goal node 16 in the backward direction.
The algorithm terminates at node 9 where two searches meet.
Artificial Intelligence 28
Bidirectional Search
Root node
................
)> 8
3
I
5/""
6
Goal node
Heuristics function:
Heuristic is a function which is used in Informed Search, and it finds the most promising path.
It takes the current state of the agent as its input and produces the estimation of how close agent
is from the goal. The heuristic method, however, might not always give the best solution, but
it guaranteed to find a good solution in reasonable time. Heuristic function estimates how close
a state is to the goal. It is represented by h(n), and it calculates the cost of an optimal path
between the pair of states. The value of the heuristic function is always positive.
Hence heuristic cost should be less than or equal to the estimated cost.
Artificial Intelligence 29
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both
the algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Artificial Intelligence 30
Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.
Example
Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in
the below table.
n d
1/ 2
B
12
B 4
1/
D 1/\
E F E
s/ 2/"' H
2
4
H G
In this search example, we are using two lists which are OPEN and CLOSED Lists.
Following are the iteration for traversing the above example.
12
2. A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds
the shortest path through the search space using the heuristic function. This search algorithm
expands less search tree and provides optimal result faster. A* algorithm is similar to UCS
except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we
can combine both costs as following, and this sum is called as a fitness number.
Algorithm of A* search:
Stepl: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.
Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Artificial Intelligence 32
Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in
the memory, so it is not practical for various large-scale problems.
Example
In this example, we will traverse the given graph using the A* algorithm. The heuristic value
of all states is given in the below table so we will calculate the f(n) of each state using the
formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.
h(n)
B
y 5 5
D s
1/
C B
---:!: G 2
D 6
Solution:
G
2; \;-\
B
D
1/ l
D
\-
Initialization: {(S, 5)}
Iteration!: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with
cost 6.
Artificial Intelligence 33
If the heuristic function is admissible, then A* tree search will always find the least cost path.
*******
34
REINFORCEMENT LEARNING
Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in
an environment by performing the actions and seeing the results of actions. For each good action, the agent
gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.
In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
unlike supervised learning.
Since there is no labeled data, so the agent is bound to learn by its experience only.
RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such
as game-playing, robotics, etc.
The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.
The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task in a
better way. Hence, we can say that "Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and learns to act within that." How a
Robotic dog learns the movement of his arms is an example of Reinforcement learning.
Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond.
The agent interacts with the environment by performing some actions, and based on those actions, the state of
the agent gets changed, and it also receives a reward or penalty as feedback.
The agent continues doing these three things (take action, change state/remain in the same state, and get
feedback), and by doing these actions, he learns and explores the environment.
The agent learns that what actions lead to positive feedback or rewards and what actions lead to negative
feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.
Terms used in Reinforcement Learning
o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the action of the agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action (a).
o In RL, the agent is not instructed about the environment and what actions need to be taken.
o It is based on the hit and trial process.
o The agent takes the next action and changes states according to the feedback of the previous action.
o The agent may get a delayed reward.
o The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.
There are mainly three ways to implement reinforcement-learning in ML, which are:
1. Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value at a
state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using the
value function. In this approach, the agent tries to apply such a policy that the action performed in each
step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
oDeterministic: The same action is produced by the policy (π) at any state.
o Stochastic: In this policy, probability determines the produced action.
3. Model-based:
In the model-based approach, a virtual model is created for the environment, and the agent explores
that environment to learn it. There is no particular solution or algorithm for this approach because the
model representation is different for each environment.
There are four main elements of Reinforcement Learning, which are given below:
1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment
1)Policy:
A policy can be defined as a way how an agent behaves at a given time. It maps the perceived states of the
environment to the actions taken on those states. A policy is the core element of the RL as it alone can define
the behavior of the agent. In some cases, it may be a simple function or a lookup table, whereas, for other
cases, it may involve general computation as a search process. It could be deterministic or a stochastic policy:
2) Reward Signal:
The goal of reinforcement learning is defined by the reward signal. At each state, the environment sends an
immediate signal to the learning agent, and this signal is known as a reward signal. These rewards are given
according to the good and bad actions taken by the agent. The agent's main objective is to maximize the total
number of rewards for good actions. The reward signal can change the policy, such as if an action selected by
the agent leads to low reward, then the policy may change to select other actions in the future.
3) Value Function: The value function gives information about how good the situation and action are and how
much reward an agent can expect. A reward indicates the immediate signal for each good and bad action,
whereas a value function specifies the good state and action for the future. The value function depends on the
reward as, without reward, there could be no value. The goal of estimating values is to achieve more rewards.
4) Model: The last element of reinforcement learning is the model, which mimics the behavior of the
environment. With the help of the model, one can make inferences about how the environment will behave.
Such as, if a state and an action are given, then a model can predict the next state and reward.
The model is used for planning, which means it provides a way to take a course of action by considering all
future situations before actually experiencing those situations. The approaches for solving the RL
problems with the help of the model are termed as the model-based approach. Comparatively, an
approach without using a model is called a model-free approach.
WORKING OF REINFORCEMENT LEARNING:
o Environment: It can be anything such as a room, maze, football ground, etc.
o Agent: An intelligent agent such as AI robot.
Let's take an example of a maze environment that the agent needs to explore. Consider the below
image:
In the above image, the agent is at the very first block of the maze. The maze is consisting of an
S6 block, which is a wall, S8 a fire pit, and S4 a diamond block.
The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get
the +1 reward; if it reaches the fire pit, then gets -1 reward point. It can take four actions: move
up, move down, move left, and move right.
The agent can take any path to reach to the final point, but he needs to make it in possible fewer
steps. Suppose the agent considers the path S9-S5-S1-S2-S3, so he will get the +1-reward point.
The agent will try to remember the preceding steps that it has taken to reach the final step. To
memorize the steps, it assigns 1 value to each previous step. Consider the below step:
Now, the agent has successfully stored the previous steps assigning the 1 value to each previous
block. But what will the agent do if he starts moving from the block, which has 1 value block on both
sides? Consider the below diagram:
It will be a difficult condition for the agent whether he should go up or down as each block has the same value.
So, the above approach is not suitable for the agent to reach the destination. Hence to solve the problem, we
will use the Bellman equation, which is the main concept behind reinforcement learning.
It is a way of calculating the value functions in dynamic programming or environment that leads to modern
reinforcement learning.
Where,
γ = Discount factor
In the above equation, we are taking the max of the complete values because the agent tries to find the
optimal solution always.
So now, using the Bellman equation, we will find value at each state of the given environment. We will start
from the block, which is next to the target block.
V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.
V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no reward at this state.
V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no reward at this state
also.
V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no reward at this
state also.
V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is no reward at this
state also.
Now, the agent has three options to move; if he moves to the blue box, then he will feel a bump if he moves to
the fire pit, then he will get the -1 reward. But here we are taking only positive rewards, so for this, he will move
to upwards only. The complete block values will be calculated using this formula. Consider the below image:
o Positive Reinforcement
o Negative Reinforcement
Positive Reinforcement:
The positive reinforcement learning means adding something to increase the tendency that expected behavior
would occur again. It impacts positively on the behavior of the agent and increases the strength of the
behavior.
This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may
lead to an overload of states that can reduce the consequences.
Negative Reinforcement:
The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that
the specific behavior will occur again by avoiding the negative condition.
It can be more effective than the positive reinforcement depending on situation and behavior, but it provides
reinforcement only to meet minimum behavior.
Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In MDP, the
agent constantly interacts with the environment and performs actions; at each action, the environment responds
and generates a new state.
MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using
MDP.
MDP uses Markov property, and to better understand the MDP, we need to learn about it.
Markov Property:
It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then
the state transition from s1 to s2 only depends on the current state and future action and states do not depend
on past actions, rewards, or states."
Or, in other words, as per Markov Property, the current state transition does not depend on any past action or
state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players
only focus on the current state and do not need to remember past actions or states.
Finite MDP:
A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the
finite MDP.
Markov Process:
Markov Process is a memoryless process with a sequence of random states S1, S2, ..... , St that uses the Markov
Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition
function P. These two components (S and P) can define the dynamics of the system.
Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used
algorithms are:
o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal difference
Learning. The temporal difference learning methods are the way of comparing temporally
successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at a particular
state "s."
o The below flowchart explains the working of Q- learning:
Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent that what
actions should be taken for maximizing the reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation
given below:
In the equation, we have various components, including reward, discount factor (γ), probability, and end states
s'. But there is no any Q-value is given so first consider the below image:
In the above image, we can see there is an agent who has three values options, V(s 1), V(s2), V(s3). As this is
MDP, so agent only cares for the current state and the future state. The agent can go to any direction (Up, Left,
or Right), so he needs to decide where to go for the optimal path. Here agent will take a move as per
probability bases and changes the state. But if we want some exact moves, so for this, we need to make some
changes in terms of Q-value. Consider the below image:
Q- represents the quality of the actions at each state. So instead of using a value at each state, we will use a
pair of state and action, i.e., Q(s, a). Q-value specifies that which action is more lubricative than others, and
according to the best Q-value, the agent takes his next move. The Bellman equation can be used for deriving
the Q-value.
To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -
value equation will be:
The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the agent.
Q-table:
A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e.,
[s, a], and initializes the values to zero. After each action, the table is updated, and the q-values are stored
within the table.
The RL agent uses this Q-table as a reference table to select the best action based on the q-values.
+1
-1
0.762 0.560 -1
• Clearly, the passive learning task is similar to the policy evaluation task.
• The main difference is that the passive learning agent does not know
o Neither the transition model T(s, a,s’), which specifies the probabilit y of reaching
state’s from state s after doing action a;
o Nor does it know the reward function R(s), which specifies the reward for each state.
• The agent executes a set of trials in the environment using its policy .
• In each trial, the agent starts in state (1,1) and experiences a sequence of state transitions until
it reaches one of the terminal states, (4,2) or (4,3).
• Its percepts supply both the current state and the reward received in that state.
• Typical trials might look like this:
• A simple method for direct utility estimation is in the area of adaptive control theory by
Widrow and Hoff(1960).
• The idea is that the utility of a state is the expected total reward from that state onward, and
each trial provides a sample of this value for each state visited.
• Example:- The first trial in the set of three given earlier provides a sample total reward of
0.72 for state (1,1), two samples of 0.76 and 0.84 for (1,2), two samples of 0.80 and 0.88 for
(1,3) and so on.
• Thus at the end of each sequence, the algorithm calculates the observed reward- to-go for
each state and updates the estimated utility for that state accordingly.
• In the limit of infinitely many trails, the sample average will come together to the true
expectations in the utility function.
• It is clear that direct utility estimation is just an instance of supervised learning.
• This means that reinforcement learning have been reduced to a standard inductive learning
problem.
Advantage:-
Inputs: percept,a percept indicating the current state s’ and reward signal r’
Static: π a,fixed policy
Increment Nsa[s,a]andNsas’[s,a,s’]
For each t such that Nsas’[s,a,t]is nonzero do
T[s,a,t]←Nsas’[s,a,t]/Nsa[s,a]
U←VALUE-DETERMINATION(π,U,mdb)
If TERMINALS?[s’]then s,a←null else s,a←s’,π[s’]
return a
Advantages:-
It can converges quite quickly
Reason:- The model usually changes only slightly with each observation, the value iteration process can
use the previous utility estimates as initial values.
The process of learning the model itself is easy
Reason:- The environment is fully observable. This means that a supervised learning task exist where the
input is a state-action pair and the output is the resulting state.
It provides a standard against which other reinforcement learning algorithms can be measured.
Disadvantage:-
It is intractable for large state spaces
Increment Ns[s]
U[s]←U[s] + α(Ns[s])(r + γU[s’] - U[s])
If TERMINAL?[s’]then s,a,r←null else s,a,r←s’,π[s’],r’
return a
• Advantages:-
o It is much simpler
o It requires much less computation perobservation
• Disadvantages:-
o It does not learn quite as fast as the ADP agent
o It shows much higher variability
• The following table shows the difference between ADP and TD approach,
• A passive learning agent has a fixed policy that determines its behavior.
• “An active agent must decide what actions to do”
• An ADP agent can be taken an considered how it must be modified to handle this new
freedom.
• The following are the required modifications:-
o First the agent will need to learn a complete model with outcome probabilities for all
actions. The simple learning mechanism used by PASSIVE-ADP-AGENT will do just
fine for this.
o Next, take into account the fact that the agent has a choice of actions. The utilities it
needs to learn are those defined by the optimal policy.
U (s) R(s) max T(s, a, s`)U (s`)
o These equations can be solved to obtain the utility function U using he value iteration
or policy iteration algorithms.
o Having obtained a utility function U that is optimal for the learned model, the agent can
extract an optimal action by one-step look ahead to maximize the expected utility;
o Alternatively, if it uses policy iteration, the optimal policy is already available, so it
should simply execute the action the optimal policy recommends.
Exploration:-
• Greedy agent is an agent that executes an action recommended by the optimal policy for the
learned model.
• The following figure shows the suboptimal policy to which this agent converges inthis
particular sequence of trials.
+1
-1
• The agent does not learn the true utilities or the true optimal policy! what happens is that, in
the 39th trial, it finds a policy that reaches +1 reward along the lower route via (2,1),
(3,1),(3,2), and (3,3).
• After experimenting with minor variations from the 276th trial onward it sticks to that policy,
never learning the utilities of the other states and never finding the optimal route via
(1,2),(1.3) and(2,3).
• Choosing the optimal action cannot lead to suboptimal results.
• The fact is that the learned model is not the same as the true environment; what is optimal in
the learned model can therefore be suboptimal in the true environment.
• Unfortunately, the agent does not know what the true environment is, so it cannot compute
the optimal action for the true environment.
• Hence this can be done by the means of Exploitation.
• The greedy agent can overlook that actions do more than provide rewards according to the
current learned model; they also contribute to learning the true model by affecting the
percepts that are received.
• An agent therefore must make a trade-off between exploitation to maximize its reward and
exploration to maximize its long-term well being.
GLIE Scheme:-
• To come up with a reasonable scheme that will eventually lead to optimal behavior by the
agent a GLIE Scheme can be used.
• A GLIE Scheme must try each action in each state an unbounded number of times to avoid
having a finite probability that an optimal action is missed because of an unusually bad series
of outcomes.
• An ADP agent using such a scheme will eventually learn the true environment model.
• A GLIE Scheme must also eventually become greedy, so that the agents actions become
optimal with respect to the learned (and hence the true)model.
• There are several GLIE Scheme as follows,
o The agent can choose a random action a fraction 1/t of the time and to follow the
greedy policy otherwise.
Exploration function:-
• Let U+ denotes the optimistic estimate of the utility of the state s, and let N(a,s) be the
number of times action a has been tried in state s.
• Suppose that value iteration is used in an ADP learning agent; then rewrite the update
equation to incorporate the optimistic estimate.
• The following equation does this,
U (s) R(s) max f T (s, a, s`)U (s`), N(a, s)
a
s`
• Here f(u ,n) is called the exploration function.
• It determines how greed is trade off against curiosity.
• The use of U+ means that benefits of exploration are propagated back from the edges of
unexplored regions, so that actions that lead toward unexplored regions are weighted more
highly, rather than just actions that are themselves unfamiliar.
• Reason:-
o Suppose the agent takes a step that normally leads to a good destination, but because
of non determinism in the environment the agent ends up in a disastrous state.
o The TD update rule will take this as seriously as if the outcome had been the normal
result of the action, where the agent should not worry about it too much since the
outcome was a fluke.
o It can be shown that the TD algorithm will converge to the same values as ADP as the
number of training sequences tends to infinity.
• The rate of change of error with respect to each parameter i i Ej / j , is to move the s
parameter in the direction of the decreasing error.
i i ( Ej (S ) / C j ) i (Uj (S ) U (S ))( U (S ) / i)
1. RL in Marketing
Marketing is all about promoting and then, selling the products or services either of your brand or someone
else’s. In the process of marketing, finding the right audience which yields larger returns on investment you
or your company is making is a challenge in itself.
And, it is one of the reasons companies are investing dollars in managing digitally various marketing
campaigns. Through real-time bidding supporting well the fundamental capabilities of RL, your and other
companies, smaller or larger, can expect: –
• more display ad impressions in real-time.
• increased ROI, profit margins.
• predicting the choices, reactions, and behavior of customers towards your products/services.
2. RL in Broadcast Journalism
Through different types of Reinforcement Learning, attracting likes and views along with tracking the
reader’s behavior is much simpler. Besides, recommending news that suits the frequently-changing
preferences of readers and other online users can possibly be achieved since journalists can now be equipped
with an RL-based system that keeps an eye on intuitive news content as well as the headlines. Take a look at
other advantages too which Reinforcement Learning is offering to readers all around the world.
• News producers are now able to receive the feedback of their users instantaneously.
• Increased communication, as users are more expressive now.
• No space for disinformation, hatred.
3. RL in Healthcare
Healthcare is an important part of our lives and through DTRs (a sequence-based use-case of RL), doctors
can discover the treatment type, appropriate doses of drugs, and timings for taking such doses. Curious to
know how is this possible!! See, DTRs are equipped with: –
• a sequence of rules which confirm the current health status of a patient.
• Then, they optimally propose treatments that can diagnose diseases like diabetes, HIV, Cancer, and
mental illness too.
If required, these DTRs (i.e. Dynamic Treatment Regimes) can reduce or remove the delayed impact of
treatments through their multi-objective healthcare optimization solutions.
4. RL in Robotics
Robotics without any doubt facilitates training a robot in such a way that a robot can perform tasks – just
like a human being can. But still, there is a bigger challenge the robotics industry is facing today – Robots
aren’t able to use common sense while making various moral, social decisions. Here, a combination of Deep
Learning and Reinforcement Learning i.e. Deep Reinforcement Learning comes to the rescue to enable the
robots with, “Learn How To Learn” model. With this, the robots can now: –
• manipulate their decisions by grasping well various objects visible to them.
• solve complicated tasks which even humans fail to do as robots now know what and how to learn from
different levels of abstractions of the types of datasets available to them.
5. RL in Gaming
Gaming is something nowadays without which you, me, or a huge chunk of people can’t live.
With games optimization through Reinforcement Learning algorithms, we may expect better
performances of our favorite games related to adventure, action, or mystery.
To prove it right, the Alpha Go example can be considered. This is a computer program that defeated the
strongest Go (a challenging classical game) Player in October 2015 and itself became the strongest Go
player. The trick of Alpha Go to defeat the player was Reinforcement Learning which kept on developing
stronger as the game is constantly exposed to unexpected gaming challenges. Like Alpha Go, there are many
other games available. Even you can also optimize your favorite games by applying appropriately prediction
models which learn how to win in even complex situations through RL-enabled strategies.
6. RL in Image Processing
Image Processing is another important method of enhancing the current version of an image to extract some
useful information from it. And there are some steps associated like:
• Capturing the image with machines like scanners.
• Analyzing and manipulating it.
• Using the output image obtained after analysis for representation, description-purposes.
Here, ML models like Deep Neural Networks (whose framework is Reinforcement Learning) can be
leveraged for simplifying this trending image processing method. With Deep Neural Networks, you can
either enhance the quality of a specific image or hide the info. of that image. Later, use it for any of your
computer vision tasks.
7. RL in Manufacturing
Manufacturing is all about producing goods that can satisfy our basic needs and essential wants. Cobot
Manufacturers (or Manufacturers of Collaborative Robots that can perform various manufacturing tasks
with a workforce of more than 100 people) are helping a lot of businesses with their own RL solutions for
packaging and quality testing. Undoubtedly, their use is making the process of manufacturing quality
products faster that can say a big no to negative customer feedback. And the lesser negative feedbacks are,
the better is the product’s performance and also, sales margin too.
POLICY SEARCH
Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given
policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action
spaces, one of the main challenges in robot learning.
1. Policy search methods are a family of systematic approaches for continuous (or large) actions and state
space.
2. With policy search, expert knowledge is easily embedded in initial policies (by demonstration,
imitation).
3. Policy search is more prefered than other RL methods in practical applications (e.g. robotics).
Policy gradient
• Family of randomized policy µ(s, a) = Pr(a|s) (deterministic policy is a special case).
• Guarantee the performance improvement: J(θµ0 ) ≥ J(θµ) ⇒ µ 0 at least better than or equal to µ
• Collect data D = {δθi , δJi} (the sampled gradients). By – perturbing the parameters: θ + δθ – applying the
new policy µ(θ + δθ) to get δJi = J(θ + δθ) − J(θ)
R(ξ) = r0 + γr1 + γ 2 r2 + . . .
• Gradient derivation
∇θJ(θ) = Z ∇θp(ξ|µθ)R(ξ)d
= E h ∇θ log p(ξ|µθ)R(ξ) i
• end for
Natural policy gradient
∇˜ θJ(θ) = G −1 (θ)∇θJ(θ)
The optimization (maxmize J(θ)) is over the space of trajectories, when considering p(ξ|µθ) is a function of
parameter ξ (instead of θ) of dimension |θ|.
• The steepest descent should minimize the J(θ + δθ) after update. This is formulated as an optimization
problem min J(θ + δθ) = J(θ) + δθ∇J(θ) subject to hδθ, δθip(ξ|µθ) =
• Taking derivative
• Therefore, δ = G −1∇J(θ)
G(θ) ≈ 1 M X M i=1 [∇θ log p(ξi |µθ)R(ξi)] × [∇θ log p(ξi |µθ)R(ξi)]>
Actor-Critic methods
UNIT-IV
SYLLABUS:
Unit-IV: Natural Language for Communication: Phrase structure grammars, Syntactic Analysis, Augmented
Grammars and semantic Interpretation, Machine Translation, Speech Recognition
Perception: Image Formation, Early Image Processing Operations, Object Recognition by appearance,
Reconstructing the 3D World, Object Recognition from Structural information, Using Vision.
Computers don’t speak languages the way humans do. They communicate in machine code or machine
language, while we speak English, Dutch, French or some other human language. Most of us don’t
understand the millions of zeros and ones computers communicate in. And in turn, computers don’t
understand human language unless they are programmed to do so. That’s where natural language
processing (NLP) comes in.
Language is a method of communication with the help of which we can speak, read and write. Natural
Language Processing (NLP) is a subfield of Computer Science that deals with Artificial Intelligence (AI), which
enables computers to understand and process human language.
Linguists How phrases and sentences can be Intuitions about well-formedness and meaning.
formed with words? Mathematical model of structure. For example,
Psycholinguists How human beings can identify the Experimental techniques mainly for measuring
structure of sentences? the performance of human beings.
How the meaning of words can be Statistical analysis of observations.
identified?
When does understanding take
place?
Components of Language
The language of study is divided into the interrelated components, which are conventional as well as
arbitrary divisions of linguistic investigation. The explanation of these components is as follows −
Phonology
The very first component of language is phonology. It is the study of the speech sounds of a particular
language. The origin of the word can be traced to Greek language, where ‘phone’ means sound or voice.
Phonetics, a subdivision of phonology is the study of the speech sounds of human language from the
perspective of their production, perception or their physical properties. IPA (International Phonetic
Alphabet) is a tool that represents human sounds in a regular way while studying phonology. In IPA, every
written symbol represents one and only one speech sound and vice-versa.
Morphology
It is the second component of language. It is the study of the structure and classification of the words in a
particular language. The origin of the word is from Greek language, where the word ‘morphe’ means ‘form’.
Morphology considers the principles of formation of words in a language. In other words, how sounds
combine into meaningful units like prefixes, suffixes and roots. It also considers how words can be grouped
into parts of speech.
Lexeme
In linguistics, the abstract unit of morphological analysis that corresponds to a set of forms taken by a single
word is called lexeme. The way in which a lexeme is used in a sentence is determined by its grammatical
category. Lexeme can be individual word or multiword. For example, the word talk is an example of an
individual word lexeme, which may have many grammatical variants like talks, talked and talking. Multiword
lexeme can be made up of more than one orthographic word. For example, speak up, pull through, etc. are
the examples of multiword lexemes.
Syntax
It is the third component of language. It is the study of the order and arrangement of the words into larger
units. The word can be traced to Greek language, where the word suntassein means ‘to put in order’. It
studies the type of sentences and their structure, of clauses, of phrases.
Semantics
It is the fourth component of language. It is the study of how meaning is conveyed. The meaning can be
related to the outside world or can be related to the grammar of the sentence. The word can be traced to
Greek language, where the word semainein means means ‘to signify’, ‘show’, ‘signal’.
Pragmatics:
It is the fifth component of language. It is the study of the functions of the language and its use in context.
The origin of the word can be traced to Greek language where the word ‘pragma’ means ‘deed’, ‘affair’
• Reparandum and repair − The repeated segment of words in between the sentence is called
reparandum. In the same segment, the changed word is called repair. Consider the following
example to understand this −
Does ABC airlines offer any one-way flights uh one-way fares for 5000 rupees?
In the above sentence, one-way flight is a reparadum and one-way flights is a repair.
Word Fragments
Sometimes we speak the sentences with smaller fragments of words. For example, wwha-what is the
time? Here the words w-wha are word fragments.
Descriptive Grammar
The set of rules, where linguistics and grammarians formulate the speaker’s grammar is called descriptive
grammar.
Perspective Grammar
It is a very different sense of grammar, which attempts to maintain a standard of correctness in the
language. This category has little to do with the actual working of the language.
Grammatical Categories
In recent years, AI has evolved rapidly, and with that, NLP got more sophisticated, too. Many of us already
use NLP daily without realizing it. You’ve probably used at least one of the following tools:
• Spell checker.
• Autocomplete.
• Spam filters.
• Voice text messaging.
GENERATIVE CAPACITY
Grammatical formalisms can be classified by their generative capacity: the set of languages they can represent.
Chomsky (1957) describes four classes of grammatical formalisms that differ only in the form of the rewrite rules. The
classes can be arranged in a hierarchy, where each class can be used to describe all the languages that can be described
by a less powerful class, as well as some additional languages. Here we list the hierarchy, most powerful class first:
Recursively enumerable grammars use unrestricted rules: both sides of the rewrite rules can have any number of
terminal and nonterminal symbols, as in the rule A B C → D E. These grammars are equivalent to Turing machines in
their expressive power.
Context-sensitive grammars are restricted only in that the right-hand side must contain at least as many symbols as
the left-hand side. The name “contextsensitive” comes from the fact that a rule such as A X B → A Y B says that
an X can be rewritten as a Y in the context of a preceding A and a following B. Context-sensitive grammars can represent
languages such as anbncn (a sequence of n copies of a followed by the same number of bs and then cs).
In context-free grammars (or CFGs), the left-hand side consists of a single nonterminal symbol. Thus, each rule
licenses rewriting the nonterminal as the right-hand side in any context. CFGs are popular for natural-language and
programming-language grammars, although it is now widely accepted that at least some natural languages have
constructions that are not context-free (Pullum, 1991). Context-free grammars can represent anbn, but not anbncn.
Regular grammars are the most restricted class. Every rule has a single nonterminal on the left-hand side and a terminal
symbol optionally followed by a nonterminal on the right-hand side. Regular grammars are equivalent in power to
finitestate machines. They are poorly suited for programming languages, because they cannot represent constructs such
as balanced opening and closing parentheses (a variation of the anbn language). The closest they can come is representing
a ∗ b ∗, a sequence of any number of as followed by any number of bs.
The grammars higher up in the hierarchy have more expressive power, but the algorithms for dealing with them are
less efficient. Up to the 1980s, linguists focused on context-free and context-sensitive languages.
There have been many competing language models based on the idea of phrase structure; we will describe a popular
model called the probabilistic context-free grammar, or PROBABILISTIC CONTEXT-FREE GRAMMAR PCFG.
1A grammar is a collection of GRAMMAR rules that defines a language as a set of allowable LANGUAGE strings of words.
“Context-free” and “probabilistic” means that the grammar assigns a probability to every string. Here is a PCFG rule:
VP → Verb [0.70]
| VP NP [0.30] .
Here VP (verb phrase) and NP (noun phrase) are non-terminal symbols. The grammar also refers to actual words,
which are called terminal symbols. This rule is saying that with probability 0.70 a verb phrase consists solely of a
We now define a grammar for a tiny fragment of English that is suitable for communication between agents exploring
the wumpus world. We call this language E0. Later sections improve on E0 to make it slightly closer to real English. We
are unlikely ever to devise a complete grammar for English, if only because no two persons would agree entirely on
what constitutes valid English.
There are five basic NLP tasks that you might recognize from school.
Lexical Analysis − The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical analysis divides the whole text into
paragraphs, sentences, and words. It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language.
Syntactic Analysis (Parsing) − Syntactic Analysis is used to check grammar, word arrangements, and shows
the relationship among the words.It involves analysis of words in the sentence for grammar and arranging
words in a manner that shows the relationship among the words. The sentence such as “The school goes to
boy” is rejected by English syntactic analyzer.
Semantic Analysis − Semantic analysis is concerned with the meaning representation. It mainly focuses on
the literal meaning of words, phrases, and sentences.
Discourse Integration − Discourse Integration depends upon the sentences that proceeds it and alsoinvokes
the meaning of the sentences that follow it.
• The meaning of any sentence depends upon the meaning of the sentence just before it. In addition,
it also brings about the meaning of immediately succeeding sentence.
Pragmatic Analysis − Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect
by applying a set of rules that characterize cooperative dialogues.
CONCEPT OF GRAMMAR
Grammar is very essential and important to describe the syntactic structure of well-formed programs. In
the literary sense, they denote syntactical rules for conversation in natural languages. Linguistics have
attempted to define grammars since the inception of natural languages like English, Hindi, etc.
The theory of formal languages is also applicable in the fields of Computer Science mainly in programming
languages and data structure. For example, in ‘C’ language, the precise grammar rules state how functions
are made from lists and statements.
A mathematical model of grammar was given by Noam Chomsky in 1956, which is effective for writing
computer languages.
• P denotes the Production rules for Terminals as well as Non-terminals. It has the form α → β,
where α and β are strings on VN 𝖴 ∑ and least one symbol of α belongs to VN
Example
Before giving an example of constituency grammar, we need to know the fundamental points about
constituency grammar and constituency relation.
PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 7
• All the related frameworks view the sentence structure in terms of constituency relation.
• The constituency relation is derived from the subject-predicate division of Latin as well as Greek
grammar.
• The basic clause structure is understood in terms of noun phrase NP and verb phrase VP.
We can write the sentence “This tree is illustrating the constituency relation” as follows −
Dependency Grammar
It is opposite to the constituency grammar and based on dependency relation. It was introduced by Lucien
Tesniere. Dependency grammar (DG) is opposite to the constituency grammar because it lacks phrasal
nodes.
Example
Before giving an example of Dependency grammar, we need to know the fundamental points about
Dependency grammar and Dependency relation.
• In DG, the linguistic units, i.e., words are connected to each other by directed links.
• The verb becomes the center of the clause structure.
• Every other syntactic units are connected to the verb in terms of directed link. These syntactic units
are called dependencies.
We can write the sentence “This tree is illustrating the dependency relation” as follows;
Definition of CFG
CFG consists of finite set of grammar rules with the following four components −
Set of Non-terminals
It is denoted by V. The non-terminals are syntactic variables that denote the sets of strings, which further
help defining the language, generated by the grammar.
Set of Terminals
It is also called tokens and defined by Σ. Strings are formed with the basic symbols of terminals.
Set of Productions
It is denoted by P. The set defines how the terminals and non-terminals can be combined. Every
production(P) consists of non-terminals, an arrow, and terminals (the sequence of terminals). Non-
terminals are called the left side of the production and terminals are called the right side of the production.
Start Symbol
The production begins from the start symbol. It is denoted by symbol S. Non-terminal symbol is always
designated as start symbol.
Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand
and process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which
describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then
the string combined by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows-
S → NP VP
VP → V NP
Lexocon −
DET → a | the
Demerits −
• They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.
• To bring out high precision, multiple sets of grammar need to be prepared. It may require a
completely different sets of rules for parsing singular and plural variations, passive sentences, etc.,
which can lead to creation of huge set of rules that are unmanageable.
Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols
that matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again
with a different set of rules. This is repeated until a specific rule is found which describes the structure of
the sentence.
We already know that lexical analysis also deals with the meaning of the words, then how is semantic
analysis different from lexical analysis? Lexical analysis is based on smaller token but on the other side
semantic analysis focuses on larger chunks. That is why semantic analysis can be divided into the following
two parts −
MACHINE TRANSLATION
All translation systems must model the source and target languages, but systems vary in the type of models
they use. Some systems attempt to analyze the source language text all the way into an interlingua
knowledge representation and then generate sentences in the target language from that representation.
This is difficult because it involves three unsolved problems: creating a complete knowledge representation
of everything; parsing into that representation and generating sentences from that representation.
Other systems are based on a transfer model. They keep a database of translation rules (or examples), and
whenever the rule (or example) matches, they translate directly. Transfer can occur at the lexical,
syntactic, or semantic level.
Using corpus methods, more complicated translations can be conducted, taking into account better
treatment of contrasts in phonetic typology, express acknowledgement, and translations of idioms, just as
the seclusion of oddities. Currently, some systems are not able to perform just like a human translator, but
in the coming future, it will also be possible.
In simple language, we can say that machine translation works by using computer software to translate
the text from one source language to another target language.
The term ‘machine translation’ (MT) refers to computerized systems responsible for producing translations
with or without human assistance. It excludes computer-based translation tools that support translators by
providing access to online dictionaries, remote terminology databanks, transmission and reception oftexts,
etc.
Before the AI technology era, computer programs for the automatic translation of text from one language
to another were developed. In recent years, AI has been tasked with making the automatic or machine
translation of human languages’ fluidity and versatility of scripts, dialects, and variations. Machine
translation is challenging given the inherent ambiguity and flexibility of human language.
Presently, SMT is extraordinary for basic translation, however its most noteworthy disadvantage is
that it doesn't factor in context, which implies translation can regularly be wrong or you can say, don't
expect great quality translation. There are several types of statistical-based machine translation
models which are: Hierarchical phrase-based translation, Syntax-based translation, Phrase-based
translation, Word-based translation.
2. Rule-based Machine Translation or RBMT
RBMT basically translates the basics of grammatical rules. It directs a grammatical examination of
the source language and the objective language to create the translated sentence. But, RBMT
requires broad editing, and its substantial reliance on dictionaries implies that proficiency is
accomplished after a significant period. (Also read: Top 10 Natural Processing Languages (NLP)
Libraries with Python)
If you especially train the machine to your requirements, machine translation gives the ideal blend of brisk
and cost-effective translations as it is less expensive than using a human translator. With a specially trained
machine, MT can catch the setting of full sentences before translating them, which gives you high quality
and human-sounding yield. Another benefit of machine translation is its capability to learn important words
and reuse them wherever they might fit.
• Leaning upon how much content should be translated, the machine translation can give translated content
very quickly, though human translators will take additional time. Time spent finding, verifying, and dealing
with a group of translators should likewise be considered.
• Numerous translation programming providers can give machine translations at practically zero cost, making
it a reasonable answer for organizations who will be unable to manage the cost of expert translations.
• Machine Translation is the instant modification of text from one language to another utilizing artificial
intelligence whereas a human translation, includes actual brainpower, in the form of one or more translators
translating the text manually.
• Text translation
Automated text translation is broadly used in an assortment of sentence-level and text-level translation
applications. Sentence-level translation applications incorporate the translation of inquiry and recovery
inputs and the translation of (OCR) outcomes of picture optical character acknowledgement. Text-level
translation applications incorporate the translation of a wide range of unadulterated reports, and the
translation of archives with organized data.
Organized data mostly incorporates the presentation configuration of text content, object type activity,
and other data, for example, textual styles, colours, tables, structures, hyperlinks, etc. Presently, the
translation objects of machine translation systems are mostly founded on the sentence level.
Most importantly, a sentence can completely communicate a subject substance, which normally frames an
articulation unit, and the significance of each word in the sentence can be resolved to an enormous degree
as per the restricted setting inside the sentence.
Also, the methods and nature of getting data at the sentence level granularity from the preparation corpus
are more effective than that dependent on other morphological levels, for example, words, expressions, and
text passages. Finally, the translation depends on sentence-level can be normally reached out to help
translation at other morphological levels.
All things considered, many undertaking situations total the change between grouping objects, and the
language in the machine translation task is just one of the succession object types. In this manner, when the
ideas of the source language and target language are stretched out from dialects to other arrangementobject
types, machine translation strategies and techniques can be applied to settle numerous comparable change
undertakings.
Speech translation : Speech recognition is the task of identifying a sequence of words uttered by a speaker,
given the acoustic signal. It has become one of the mainstream applications of AI—millions of people interact
with speech recognition systems every day to navigate voice mail systems, search the Web from mobile
phones, and other applications. Speech is an attractive option when hands-free operation is necessary, as when
operating machinery.
Speech recognition is difficult because the sounds made by a speaker are ambiguous and, well, noisy. As a
well-known example, the phrase “recognize speech” sounds almost the same as “wreck a nice beach” when
spoken quickly.
With the fast advancement of mobile applications, voice input has become an advantageous method of
human-computer cooperation, and discourse translation has become a significant application situation. The
fundamental cycle of discourse interpretation is "source language discourse source language text- target
language text-target language discourse".
In this cycle, programmed text translation from source language text to target-language text is an important
moderate module. What's more, the front end and back end likewise need programmed discourse
recognition, ASR and text-to-speech, TTs.
The quality of a speech recognition system depends on the quality of all of its components the language model, the
word-pronunciation models, the phone models, and the signal processing algorithms used to extract spectral features
from the acoustic signal.
SPEECH RECOGNITION
Speech recognition refers to a computer interpreting the words spoken by a person and converting them
to a format that is understandable by a machine. Depending on the end-goal, it is then converted to text or
voice or another required format.
Speech recognition AI applications have seen significant growth in numbers in recent times as businesses are
increasingly adopting digital assistants and automated support to streamline their services. Voice assistants,
smart home devices, search engines, etc are a few examples where speech recognition has seen prominence.
As per Research and Markets, the global market for speech recognition is estimated to growat a CAGR of
17.2% and reach $26.8 billion by 2025.
Speech recognition is fast overcoming the challenges of poor recording equipment and noise cancellation,
variations in people’s voices, accents, dialects, semantics, contexts, etc using artificial intelligence and
machine learning. This also includes challenges of understanding human disposition, and the varyinghuman
language elements like colloquialisms, acronyms, etc. The technology can provide a 95% accuracy now as
compared to traditional models of speech recognition, which is at par with regular human communication.
Furthermore, it is now an acceptable format of communication given the large companies that endorse it
and regularly employ speech recognition in their operations. It is estimated that a majority of search engines
will adopt voice technology as an integral aspect of their search mechanism.
Speech recognition and AI play an integral role in NLP models in improving the accuracy and efficiency of
human language recognition.
Use Cases of Speech Recognition
Let’s explore the uses of speech recognition applications in different fields:
1. Voice-based speech recognition software is now used to initiate purchases, send emails, transcribe
meetings, doctor appointments, and court proceedings, etc.
2. Virtual assistants or digital assistants and smart home devices use voice recognition software to answer
questions, provide weather news, play music, check traffic, place an order, and so on.
3. Companies like Venmo and PayPal allow customers to make transactions using voice assistants. Several
banks in North America and Canada also provide online banking using voice-based software.
4. Ecommerce is significantly powered by voice-based assistants and allows users to make purchases quickly
and seamlessly.
5. Speech recognition is poised to impact transportation services and streamline scheduling, routing, and
navigating across cities.
6. Podcasts, meetings, and journalist interviews can be transcribed using voice recognition. It is also used to
provide accurate subtitles to a video.
7. There has been a huge impact on security through voice biometry where the technology analyses the
varying frequencies, tone and pitch of an individual’s voice to create a voice profile. An example of this is
Switzerland’s telecom company Swisscom which has enabled voice authentication technology in its call
centres to prevent security breaches.
CHAPTER-2:
For example: Human beings have sensory receptors such as touch, taste, smell, sight and hearing. So, the
information received from these receptors is transmitted to human brain to organize the received
information.
Perception in Artificial Intelligence is the process of interpreting vision, sounds, smell, and touch. Perception
helps to build machines or robots that react like humans. Perception is a process to interpret, acquire, select,
and then organize the sensory information from the physical world to make actions like humans. The main
difference between AI and robot is that the robot makes actions in the real world.
According to the received information, action is taken by interacting with the environment to manipulate
and navigate the objects.
Perception provides agents with information about the world they inhabit by interpreting the response of
sensors. A sensor measures some aspect of the environment in a form that can be used as input by an agent
program. The sensor could be as simple as a switch, which gives one bit telling whether it is on or off, or as
complex as the eye. A variety of sensory modalities are available to artificial agents.
Perception and action are very important concepts in the field of Robotics. The following figures show the
complete autonomous robot.
In the above figure, an optical axis is perpendicular to the image plane and image plane
is generally placed in front of the optical center.
So, let P be the point in the scene with coordinates (X,Y,Z) and P' be its image plane
with coordinates (x, y, z).
A. Segmentation
The proposed image processing consists of converting the image to grayscale, obtaining a lighter
image format. To obtain a lighter image format is needed to convert on grayscale images. Then, edges
of the images are detected by the derivative method; before the image is enlarged and eroded to close
the found edges. Finally, the borders are filled, achieving a mask that identifies the position of the
object inside the image. Each step is described in detail below:
(1)
2) Images Edges Detection. The image obtained in the previous step can be represented as a
discrete function in 2D, which is defined by the coordinates of each pixel m and n. The discrete value of
the function is evaluated at a specific point; its procedure is known as brightness or pixel intensity. An
edge is defined as tone changes between pixels, in cases where changes exceed a threshold value, it is
considered an edge. Different methods to identify edges have been proposed, one of them is the
intensity gradient of each pixel, using a convolution mask, then magnitude is calculated finally, the
threshold process is applied [14].
The most used edge detection techniques employ local operators [ 15], using discrete approximations of
the first and second of grayscales images, hereunder it will be described the proposed operator, which
is based on the first derivative of the image.
There are number of techniques available in the visual stimulus for 3D-image extraction
such as motion, binocular stereopsis, texture, shading, and contour. Each of these
techniques operates on the background assumptions about physical scene to provide
interpretation.
One of the principal uses of vision is to provide information for manipulating objects—picking them up,
grasping, twirling, and so on—as well as navigating in a scene while avoiding obstacles. The capability to use
vision for these purposes is present in the most primitive of animal visual systems. Perhaps the evolutionary
origin of the vision sense can be traced back to the presence of a photosensitive spot on one end of an
organism that enabled it to orient itself toward (or away from) the light. Flies use vision based on optic flow
to control their landing responses. Mobile robots moving around in an environment need to know where the
obstacles are, where free space corridors are available, and so on.
******
UNIT-V
SYLLABUS:
Philosophical foundations: Weak AI, Strong AI, Ethics and Risks of AI, Agent
Components, Agent Architectures, Are we going in the right direction, What if AI
does succeed.
AI Unit-5.3: ROBOT:
Robots are physical agents that perform tasks by manipulating the physical
world.
Effectors have a single purpose that to assert physical forces on the environment.
Robots are also equipped with sensors, which allow them to perceive their
environment.
2. MOBBILE ROBOT:
The second category is the mobile robot. Mobile robots move about their
environment using wheels, legs, or similar mechanisms. They have been put to
use delivering food in hospitals, moving containers at loading docks, and similar
tasks. Other types of mobile robots include unmanned air vehicles,
Autonomous underwater vehicles etc..,
3. MOBILE MANIPULATOR:
The third type of robot combines mobility with manipulation, and is often called
a mobile manipulator. Humanoid robots mimic the human torso.
ROBOT HARDWARE:
The robot hardware mainly depends on 1.sensors and 2.effectors
1.sensors:
Sensors are the perceptual interface between robot and environment.
PASSIVE SENSOR: Passive sensors, such as cameras, are true observers of the
environment: they capture signals that are generated by other sources in the
environment.
ACTIVE SENSOR: Active sensors, such as sonar, send energy into the
environment. They rely on the fact that this energy is reflected back to the sensor.
Range finders are sensors that measure the distance to nearby objects. In
the early days of robotics, robots were commonly equipped with sonar sensors.
Sonar sensors emit directional sound waves, which are reflected by objects, with
some of the sound making it back into the sensor.
Stereo vision relies on multiple cameras to image the environment from
slightly different viewpoints, analyzing the resulting parallax in these images to
compute the range of surrounding objects.
Other common range sensors include radar, which is often the sensor of
choice for UAVs. Radar sensors can measure distances of multiple kilometers.
On the other extreme end of range sensing are tactile sensors such as whiskers,
bump panels, and touch-sensitive skin.
Other important aspects of robot state are measured by force sensors and
torque sensors. These are indispensable when robots handle fragile objects or
objects whose exact shape and location is unknown.
EFFECTORS:
Effectors are the means by which robots move and change the shape of their
bodies. To understand the design of effectors we use the concept of degree of
freedom.
We count one degree of freedom for each independent direction in which a robot,
or one of its effectors, can move. For example, a rigid mobile robot such as an
AUV has six degrees of freedom, three for its (x, y, z) location in space and three
for its angular orientation, known as yaw, roll, and pitch. These six degrees define
the kinematic state2 or pose of the robot. The dynamic state of a robot includes
these six plus an additional six dimensions for the rate ofchange of each
kinematic dimension, that is, their velocities.
For nonrigid bodies, there are additional degrees of freedom within the
robot itself. For example, the elbow of a human arm possesses two degree of
freedom. It can flex the upper arm towards or away, and can rotate right or left.
The wrist has three degrees of freedom. It can move up and down, side to side,
and can also rotate. Robot joints also have one, two, or three degrees of freedom
each. Six degrees of freedom are required to place an object, such as a hand, at a
particular point in a particular orientation.
In the fig 4(a) has exactly six degrees of freedom, created REVOLUTE JOINT by
five revolute joints that generate rotational motion and one prismatic joint that
generates sliding motion
For mobile robots, the DOFs are not necessarily the same as the number
of actuated elements.
Consider, for example, your average car: it can move forward or backward, and
it can turn, giving it two DOFs. In contrast, a car’s kinematic configuration is
three-dimensional: on an open flat surface, one can easily maneuver a car to any
(x, y) point, in any orientation. (See Figure 25.4(b).) Thus, the car has three
effective degrees of freedom but two control label degrees of freedom. We say
a robot is nonholonomic if it has more effective DOFs than controllable DOFs
and holonomic if the two numbers are the same.
Sensors and effectors alone do not make a robot. A complete robot also needs a
source of power to drive its effectors. The electric motor is the most popular
mechanism for both manipulator actuation and locomotion, but pneumatic
actuation using compressed gas and Hydraulic actuation using pressurized
fluids also have their application niches.
ROBOTIC PERCEPTION:
Perception is the process by which robots map sensor measurements into
internal representations of the environment. Perception is difficult because
sensors are noisy, and the environment is partially observable, unpredictable, and
often dynamic.
For robotics problems, we include the robot’s own past actions as observed
variables in the model. Figure 25.7 shows the notation used in this
chapter: Xt is the state of the environment (including the robot) at time t, Zt is
the observation received at time t, and At is the action taken after the observation
is received.
We would like to compute the new belief state, P(Xt+1 | z1:t+1, a1:t), from the
current belief state P(Xt | z1:t, a1:t−1) and the new observation zt+1.
Thus, we modify the recursive filtering equation (15.5 on page 572) to use
integration rather than summation:
P(Xt+1 | z1:t+1, a1:t)
= αP(zt+1 | Xt+1)_ P(Xt+1 | xt, at) P(xt | z1:t, a1:t−1) dxt . (25.1)
This equation states that the posterior over the state variables X at time t + 1 is
calculated recursively from the corresponding estimate one time step earlier. This
calculation involves the previous action at and the current sensor measurement
zt+1. The probability P(Xt+1 | xt, at) is called the transition model or motion
model, and P(zt+1 | X t+1) is the sensor model.
Knowledge about where things are is at the core of any successful physical
interaction with the environment.
To keep things simple, let us consider a mobile robot that moves slowly in a flat
2D world. Let us also assume the robot is given an exact map of the environment.
The pose of such a mobile robot is defined by its two Cartesian coordinates with
values x and y and its heading with value θ, as illustrated in Figure 25.8(a). If we
arrange those three values in a vector, then any particular state is given by Xt =
θ .
In the kinematic approximation, each action consists of the ―instantaneous‖
specification of two velocities—a translational velocity vt and a rotational
velocity ωt. For small time intervals Δt, a crude deterministic model of the
motion of such robots is given by
Again, noise distorts our measurements. To keep things simple, one might assume
Gaussian noise with covariance Σz, giving us the sensor model
P(zt | xt) = N(ˆzt,Σz) .
This problem is important for many robot applications, and it has been studied
extensively under the name simultaneous localization and mapping,
abbreviated as SLAM.
SLAM problems are solved using many different probabilistic
techniques, including the extended Kalman filter
Methods that make robots collect their own training data are called Self
Supervised.
PLANNING TO MOVE:
There are two main approaches: cell decomposition and skeletonization. Each
reduces the continuous path-planning problem to a discrete graph-search
problem.
1 Configuration space
We will start with a simple representation for a simple robot motion problem. It
has two joints that move independently. the robot’s configuration can be
described by a four dimensional coordinate: (xe, ye) for the location of the elbow
relative to the environment and (xg, yg) for the location of the gripper. They
constitute what is known as workspace representation.
4 Skeletonization methods
The second major family of path-planning algorithms is based on the idea of
skeletonization.
These algorithms reduce the robot’s free space to a one-dimensional
representation, for which the planning problem is easier. This lower- dimensional
representation is called a skeleton of the configuration space.
Voronoi graph of the free space—the set of all points that are equidistant
to two or more obstacles. To do path planning with a Voronoi graph, the robot
first changes its present configuration to a point on the Voronoi graph. It is easy
to show that this can always be achieved by a straight-line motion in configuration
space. Second, the robot follows the Voronoi graph until it reaches the point
nearest to the target configuration. Finally, the robot leaves theVoronoi graph and
moves to the target. Again, this final step involves straight- line motion in
configuration space.
1 Subsumption architecture
The subsumption architecture is a framework for assembling reactive
controllers out of finite state machines. Nodes in these machines may contain tests
for certain sensor variables, in which case the execution trace of a finite state
machine is conditioned on the outcome of such a test. The resultingmachines are
refereed to as augmented finite state machines, or AFSMs, where the
augmentation refers to the use of clocks.
2 Three-layer architecture
Hybrid architectures combine reaction with deliberation. The most popular
hybrid architecture is the three-layer architecture, which consists of a reactive
layer, an executive layer, and a deliberative layer.
Data enters this pipeline at the sensor interface layer. The perception
layer then updates the robot’s internal models of the environment based on this
data. Next, these models are handed to the planning and control layer. Those
are then communicated back to the vehicle through the vehicle interface layer.
The key to the pipeline architecture is that this all happens in parallel. While
the perception layer processes the most recent sensor data, the control layer bases
its choices on slightly older data. In this way, the pipeline architecture is similar
to the human brain. We don’t switch off our motion controllers when we digest
new sensor data. Instead, we perceive, plan, and act all at the same time. Processes
in the pipeline architecture run asynchronously, and all computation is data-
driven. The resulting system is robust, and it is fast.
APPLICTION DOMAINS:
Industry and Agriculture. Traditionally, robots have been fielded in areas that
require difficult human labour, yet are structured enough to be amenable to
robotic automation. The best example is the assembly line, where manipulators
routinely perform tasks such as assembly, part placement, material handling,
welding, and painting. In many of these tasks, robots have become more cost-
effective than human workers.
Transportation. Robotic transportation has many facets: from autonomous
helicopters that deliver payloads to hard-to-reach locations, to automatic
wheelchairs that transport people who are unable to control wheelchairs by
themselves, to autonomous straddle carriers that outperform skilled human
drivers when transporting containers from ships to trucks on loading docks.
Robotic cars. Most of use cars every day. Many of us make cell phone calls while
driving. Some of us even text. The sad result: more than a million people die
every year in traffic accidents. Robotic cars like BOSS and STANLEY offer
hope: Not only will they make driving much safer, but they will also free us from
the need to pay attention to the road during our daily commute.
Health care. Robots are increasingly used to assist surgeons with instrument
placement when operating on organs as intricate as brains, eyes, and hearts.
Robots have become indispensable tools in a range of surgical procedures, such
as hip replacements, thanks to their high precision. In pilot studies, robotic
devices have been found to reduce the danger of lesions when performing
colonoscopy.
Exploration. Robots have gone where no one has gone before, including the
surface of Mars. Robotic arms assist astronauts in deploying and retrieving
satellites and in building the International Space Station. Robots also helpexplore
under the sea. They are routinely used to acquire maps of sunken ships.
AI is impossible depends on how it is defined. we defined AI as the quest for the best agent
program on a given architecture. With this formulation, AI is by definition possible: for any digital
architecture with k bits of program storage there are exactly 2k agent programs, and all we have to do
to find the best one is enumerate and test them all. This might not be feasible for large k, but
philosophers deal with the theoretical, not the practical.
Our definition of AI works well for the engineering problem of finding a good agent, given an
152
architecture. Therefore, we’re tempted to end this section right now, answering the title question in the
affirmative. But philosophers are interested in the problem of compar- ing two architectures—human and
machine. Furthermore, they have traditionally posed the question not in terms of maximizing expected
utility but rather as, “Can machines think?”
Alan Turing, in his famous paper “Computing Machinery and Intelligence” (1950), sug- gested
that instead of asking whether machines can think, we should ask whether machines can pass a
behavioral intelligence test, which has come to be called the Turing Test. The testis for a program to
have a conversation (via online typed messages) with an interrogator for five minutes. The interrogator
then has to guess if the conversation is with a program or aperson; the program passes the test if it
fools the interrogator 30% of the time.
The argument from disability
The “argument from disability” makes the claim that “a machine can never do X.” As exam- ples of X,
Turing lists the following:
Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humor, tell right from wrong,
make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from
experience, use words properly, be the subject of its own thought, have as much diversity of behavior as
man, do something really new
It is clear that computers can do many things as well as or better than humans, including things
that people believe require great human insight and understanding. This does not mean, of course, that
computers use insight and understanding in performing these tasks those are not part of behavior, and
we address such questions elsewhere but the point is that one’s first guess about the mental processes
required to produce a given behavior is often wrong. It is also true, of course, that there are many tasks
at which computers do not yet excel (to putit mildly), including Turing’s task of carrying on an open-
ended conversation.
153
also true for careful mathematical thought. A famous example is the four-color map problem.
The argument from informality
One of the most influential and persistent criticisms of AI as an enterprise was raised by Turing as the
“argument from informality of behavior.” Essentially, this is the claim that human behavior is far too
complex to be captured by any simple set of rules and that because computers can do no more than follow
a set of rules, they cannot generate behavior as intelligent as that of humans. The inability to capture
everything in a set of logical rules is called the qualification problem in AI.
1. Good generalization from examples cannot be achieved without background knowledge. They
claim no one has any idea how to incorporate background knowledge into the neural network
learning process. In fact, that there are techniques for using prior knowledge in learning
algorithms. Those techniques, however, rely on the availability of knowledge in explicit form,
something that Dreyfusand Dreyfus strenuously deny. In our view, this is a good reason for a
serious redesign of current models of neural processing so that they can take advantage of
previously learned knowledge in the way that other learning algorithms do.
2. Neural network learning is a form of supervised learning, requiring the prior identification of
relevant inputs and correct outputs. Therefore, they claim, it cannot operate autonomously
without the help of a human trainer. In fact, learning without a teacher can be accomplished by
unsupervised learning and reinforcement learning .
3. Learning algorithms do not perform well with many features, and if we pick a subset of features,
“there is no known way of adding new features should the current set proveinadequate to account
for the learned facts.” In fact, new methods such as support vector machines handle large feature
sets very well. With the introduction of large Web-based data sets, many applications in areas
such as language processing (Sha and Pereira, 2003) and computer vision (Viola and Jones, 2002a)
routinely handle millionsof features.
4. The brain is able to direct its sensors to seek relevant information and to process itto extract aspects
relevant to the current situation. But, Dreyfus and Dreyfus claim, “Currently, no details of this
mechanism are understood or even hypothesized in a waythat could guide AI research.” Infact, the
field of active vision, underpinned by the theory of information value , is concernedwith
exactly the problem of directing sensors, and already some robots have incorporated the
theoretical results obtained
154
Turing argues that Jefferson would be willing to extend the polite convention to ma- chines if
only he had experience with ones that act intelligently. He cites the following dialog, which has
become such a part of AI’s oral tradition that we simply have to include it:
HUMAN: In the first line of your sonnet which reads “shall I compare thee to a summer’s day,”
would not a “spring day” do as well or better?
MACHINE: It wouldn’t scan.
HUMAN: How about “a winter’s day.” That would scan all right.
MACHINE: Yes, but nobody wants to be compared to a winter’s day.
HUMAN: Would you say Mr. Pickwick reminded you of Christmas?
MACHINE: In a way.
HUMAN: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would mind the
comparison.
MACHINE: I don’t think you’re serious. By a winter’s day one means a typical winter’s day,
rather than a special one like Christmas.
155
1. The causal mechanisms of consciousness that generate these kinds of outputs in normalbrains are still
operating in the electronic version, which is therefore conscious.
2. The conscious mental events in the normal brain have no causal connection to behavior, and are
missing from the electronic brain, which is therefore not conscious.
3. The experiment is impossible, and therefore speculation about it is meaningless.
Biological naturalism and the Chinese Room
A strong challenge to functionalism has been mounted by John Searle’s (1980) biological naturalism,
according to which mental states are high-level emergent features that are caused by low-level physical
processes in the neurons, and it is the (unspecified) properties of the neurons that matter. Thus, mental
states cannot be duplicated just on the basis of some pro- gram having the same functional structure with
the same input–output behavior; we would require that the program be running on an architecture with the
same causal power as neurons. To support his view, Searle describes a hypothetical system that is clearly
running a program and passes the Turing Test, but that equally clearly (according to Searle) does not
understand anything of its inputs and outputs. His conclusion is that running the appropriate program (i.e.,
having the right outputs) is not a sufficient condition for being a mind.
So far, so good. But from the outside, we see a system that is taking input in the form of Chinese sentences
and generating answers in Chinese that are as “intelligent” as those in the conversation imagined by
Turing.4 Searle then argues: the person in the room doesnot understand Chinese (given). The rule book
and the stacks of paper, being just pieces ofpaper, do not understand Chinese. Therefore, there is no
understanding of Chinese. Hence, according to Searle, running the right program does not necessarily
generate understanding.
The real claim made by Searle rests upon the following four axioms :
1. Computer programs are formal (syntactic).
2. Human minds have mental contents (semantics).
3. Syntax by itself is neither constitutive of nor sufficient for semantics.
4. Brains cause minds.
From the first three axioms Searle concludes that programs are not sufficient for minds. In other words, an
agent running a program might be a mind, but it is not necessarily a mind justby virtue of running the
program. From the fourth axiom he concludes “Any other system capable of causing minds would have to
have causal powers (at least) equivalent to thoseof brains.” From there he infers that any artificial brain
would have to duplicate the causal powers of brains, not just run a particular program, and that human
brains do not produce mental phenomena solely by virtue of running a program.
156
thought experiment, which the subjective experience of per- son X when seeing red objects is the same
experience that the rest of us experience when seeing green objects, and vice versa.
This explanatory gap has led some philosophers to conclude that humans are simply incapable of
forming a proper understanding of their own consciousness. Others, notably Daniel Dennett (1991), avoid
the gap by denying the existence of qualia,attributing them to a philosophical confusion.
THE ETHICS AND RISKS OF DEVELOPING ARTIFICIAL INTELLIGENCE
So far, we have concentrated on whether we can develop AI, but we must also consider whether we should.
If the effects of AI technology are more likely to be negative than positive, then it would be the moral
responsibility of workers in the field to redirect their research. Many new technologies have had
unintended negative side effects: nuclear fission brought Chernobyl and the threat of global destruction;
the internal combustion engine brought air pollution, global warming, and the paving-over of paradise. In
a sense, automobiles are robots that have conquered the world by making themselves indispensable.
AI, however, seems to pose some fresh problems beyond that of, say, building bridges that don’t fall
down:
• People might lose their jobs to automation.
• People might have too much (or too little) leisure time.
• People might lose their sense of being unique.
• AI systems might be used toward undesirable ends.
• The use of AI systems might result in a loss of accountability.
• The success of AI might mean the end of the human race.
People might lose their jobs to automation. The modern industrial economy has be come dependent on
computers in general, and select AI programs in particular. For example, much of the economy, especially
in the United States, depends on the availability of con- sumer credit. Credit card applications, charge
approvals, and fraud detection are now done by AI programs. One could say that thousands of workers
have been displaced by these AI programs, but in fact if you took away the AI programs these jobs would
not exist, because human labor would add an unacceptable cost to the transactions.
People might lose their sense of being unique. In Computer Power and Human Rea- son, Weizenbaum
(1976), the author of the ELIZA program, points out some of the potential threats that AI poses to society.
One of Weizenbaum’s principal arguments is that AI research makes possible the idea that humans are
automata—an idea that results in a loss of autonomyor even of humanity.
AI systems might be used toward undesirable ends. Advanced technologies have often been used by the
powerful to suppress their rivals. As the number theorist G. H. Hardy wrote (Hardy, 1940), “A science is
said to be useful if its development tends to accentuate the existing inequalities in the distribution of wealth,
or more directly promotes the destruction of human life.” This holds for all sciences, AI being no exception.
Autonomous AI systems are now commonplace on the battlefield; the U.S. military deployed over 5,000
autonomousaircraft and 12,000 autonomous ground vehicles in Iraq (Singer, 2009).
The use of AI systems might result in a loss of accountability. In the litigious atmo-sphere that prevails in
the United States, legal liability becomes an important issue. When aphysician relies on the judgment ofa
medical expert system for a diagnosis, who is at fault ifthe diagnosis is wrong? Fortunately, due in part to the
growing influence of decision-theoreticmethods in medicine, it is now accepted that negligence cannot
157
be shown if the physician performs medical procedures that have high expected utility, even if the actual
result is catastrophic for the patient.
The success of AI might mean the end of the human race. Almost any technology has the potential to
cause harm in the wrong hands, but with AI and robotics, we have the new problem that the wrong hands
might belong to the technology itself. Countless science fictionstories have warned about robots or robot–
human cyborgs running amok.
If ultra intelligent machines are a possibility, we humans would do well to make sure that we design
their predecessors in such a way that they design themselves to treat us well. Science fiction writer Isaac
Asimov (1942) was the first to address this issue, with his threelaws of robotics:
1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey orders given to it by human beings, except where such orders wouldconflict with
the First Law.
A robot must protect its own existence as long as such protection does not conflict withthe First or Second
Law
AGENT COMPONENTS
Interaction with the environment through sensors and actuators: For much of thehistory of AI, this
has been a glaring weak point. With a few honorable exceptions, AI sys- tems were built in such a way
that humans had to supply the inputs and interpret the outputs,
ri ad
em
H
158
inuncertain environments. Current filtering and perception algorithms can be combined to do areasonable
job of reporting low-level predicates such as “the cup is on the table.” Detectinghigher-level actions, such
as “Dr. Russell is having a cup of tea with Dr. Norvig while dis- cussing plans for next week,” is more
difficult. Currently it can be done only with the help of annotated examples.
Projecting, evaluating, and selecting future courses of action: The basic knowledge- representation
requirements here are the same as for keeping track of the world; the primary difficulty is coping with
courses of action—such as having a conversation or a cup of tea—that consist eventually of thousands or
millions of primitive steps for a real agent. It is only by imposing hierarchical structure on behavior that
we humans cope at all.how to use hierarchical representations to handle problems of this scale; fur- ther
more, work in hierarchical reinforcement learning has succeeded in combining someof these ideas with
the techniques for decision making under uncertainty described in. As yet, algorithms for the partially
observable case (POMDPs) are using the same atomic state representation we used for the search algorithms
It has proven very difficult to decomposepreferences over complex states in the same way that Bayes nets
decompose beliefs over complex states. One reason may be that preferences over states are really compiled
from preferences over state histories, which are described by reward functions
Learning: Chapters 18 to 21 described how learning in an agent can be formulated as inductive learning
(supervised, unsupervised, or reinforcement-based) of the functions that constitute the variouscomponents
of the agent. Very powerful logical and statistical tech- niques have been developed that can cope with quite
large problems, reaching or exceeding human capabilities in many tasks—as long as we are dealing with a
predefined vocabulary of features and concepts.
AGENT ARCHITECTURES
It is natural to ask, “Which of the agent architectures should an agent use?” The answer is, “All of them!”
We have seen that reflex responses are needed for situations in which time is of the essence, whereas
knowledge-based deliberation allows the agent to plan ahead. A complete agent must be able to do both,
using a hybrid architecture. One important property of hybrid architectures is that the boundariesbetween
different decision components are not fixed. For example, compilation continually converts declarative in-
formation at the deliberative level into more efficient representations, eventually reaching the reflex level.
For example, a taxi-driving agent that sees an accident ahead must decide in a split second either to brake
or to take evasive action. It should also spend that split second thinking about the most important questions,
such as whether the lanes to the left and right are clear and whether there is a large truck close behind, rather
than worrying about wear and tear on the tires or where to pick up the next passenger. These issues are
usually studied under the heading of real-time AI
159
Fig: Compilation serves to convert deliberative decision making into more efficient, reflexive
mechanisms. Clearly, there is a pressing need for general methods of controlling deliberation, ratherthan specific
recipes for what to think about in each situation. The first useful idea is to employ anytime algorithms
The second technique for controlling deliberation is decision-theoretic meta reasoning (Russell and
Wefald, 1989, 1991; Horvitz, 1989; Horvitz and Breese, 1996). This method applies the theory of
information value to the selection of individual computations. The value of a computation depends on both
its cost (in terms of delaying action) andits benefits (in terms of improved decision quality). Meta reasoning
techniques can be used to design better search algorithms and to guarantee that the algorithms have the
anytime property. Meta reasoning is expensive, of course, and compilation methods can be applied so that
the overhead is small compared to the costs of the computations being controlled. Meta level reinforcement
learning may provide another way to acquire effective policies for controllingdeliberation
Meta reasoning is one specific example of a reflective architecture—that is, an architecture that enables
deliberation about the computational entities and actions occurring withinthe architecture itself. A theoretical
foundation for reflective architectures can be built by defining a joint state space composed from the
environment state and the computational stateof the agent itself.
The preceding section listed many advances and many opportunities for further progress. Butwhere is this
all leading? Dreyfus (1992) gives the analogy of trying to get to the moon by climbing a tree; one can
report steady progress, all the way to the top of the tree. In this section, we consider whether AI’s current
path is more like a tree climb or a rocket trip.
Perfect rationality. A perfectly rational agent acts at every instant in such a way as to maximize its
expected utility, given the information it has acquired from the environment. We have seen that the
calculations necessary to achieve perfect rationality in most environments are too time consuming, so
perfect rationality is not a realistic goal.
Calculative rationality. This is the notion of rationality that we have used implicitly in de- signing logical
and decision-theoretic agents, and most of theoretical AI research has focusedon this property. A
calculatively rational agent eventually returns what would have been therational choice at the beginning of
its deliberation. This is an interesting property for a system to exhibit, but in most environments, the right
answer at the wrong time is of no value. In practice, AI system designers are forced to compromise on
decision quality to obtain reason- able overall performance; unfortunately, the theoretical basis of
calculative rationality does not provide a well-founded way to make such compromises.
160
Bounded rationality. Herbert Simon (1957) rejected the notion of perfect (or even approxiimately perfect)
rationality and replaced it with bounded rationality, a descriptive theory of decision making by real agents.
Bounded optimality (BO). A bounded optimal agent behaves as well as possible, given its computational
resources. That is, the expected utility of the agent program for a bounded optimal agent is at least as high
as the expected utility of any other agent program running onthe same machine.
In David Lodge’s Small World (1984), a novel about the academic world of literary criticism, the
protagonist causes consternation by asking a panel of eminent but contradictory literary theorists the
following question: “What if you were right?” None of the theorists seems to have considered this question
before, perhaps because debating unfalsifiable theories is an end in itself. Similar confusion can be evoked
by asking AI researchers, “What if you succeed?”
We can expect that medium-level successes in AI would affect all kinds of people in their daily lives.
So far, computerized communication networks, such as cell phones and the Internet, have had this kind of
pervasive effect on society, but AI has not. AI has been at workbehind the scenes—for example, in
automatically approving or denying credit card transactions for every purchase made on the Web—but has
not been visible to the average consumer.We can imagine that truly useful personal assistants for the office
or the home would have a large positive impact on people’s lives, although they might cause some economic
dislocation in the short term. Automated assistants for driving could prevent accidents, saving tens of
thousands of lives per year. A technological capability at this level might also be applied to the development
of autonomous weapons, which many view as undesirable. Some of the biggest societal problems we face
today—such as the harnessing of genomic information fortreating disease, the efficient management of
energy resources, and the verification of treaties concerning nuclear weapons—are being addressed with the
help of AI technologies.
Finally, it seems likely that a large-scale success in AI—the creation of human-level in-telligence and
beyond—would change the lives of a majority of humankind. The very natureof our work and play would
be altered, as would our view of intelligence, consciousness, andthe future destiny of the human race. AI
systems at this level of capability could threaten hu- man autonomy, freedom, and even survival. For these
reasons, we cannot divorce AI researchfrom its ethical consequences
In conclusion, we see that AI has made great progress in its short history, but the final sentence of Alan
Turing’s (1950) essay on Computing Machinery and Intelligence is still valid today:
We can see only a short distance ahead, but we can see that much remains to be done.
161
PART –A
1. Explain briefly Robot Perception
2. Distinguish between Weak AI and Strong AI.
3. Explain about Agent Architectures.
4. What if AI does succeed.
PART –B
162