Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views112 pages

Artificial Intelligence

The document outlines the syllabus and content for Units 1 and 2 of an Artificial Intelligence course, covering foundational concepts, the history of AI, and intelligent agents. It discusses the definitions of AI, the distinction between strong and weak AI, and the state of the art in machine learning, reinforcement learning, deep learning, and natural language processing. Additionally, it explains the components of AI, the concept of rationality in agents, and the properties of different environments in which AI agents operate.

Uploaded by

Poorna Chandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views112 pages

Artificial Intelligence

The document outlines the syllabus and content for Units 1 and 2 of an Artificial Intelligence course, covering foundational concepts, the history of AI, and intelligent agents. It discusses the definitions of AI, the distinction between strong and weak AI, and the state of the art in machine learning, reinforcement learning, deep learning, and natural language processing. Additionally, it explains the components of AI, the concept of rationality in agents, and the properties of different environments in which AI agents operate.

Uploaded by

Poorna Chandu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 112

AI UNIT-1 & 2

[ SYLLABUS &. CONTENTS

Topics Page No.


I
UNIT-I I

Foundations of AI:
What is AI, History of AI,
Strong and weak AI,
The State of the Art.
Intelligent Agents: 1 - 14
Agents and Environments,
Good Behavior:
The Concept of Rationality,
The Nature of Environments,
The Structure of Agents.
UNIT-II
Solving Problems by Searching:
Problem Solving Agents,
Example Problems, 15 - 33
Searching for Solutions,
Uninformed Search Strategies,
Informed (Heuristic) Search Strategies,
Heuristic Functions.
Artificial Intelligence 1

UNIT- I
FOUNDATIONS OF AI
WHAT ISAI?
Artificial Intelligence (AI) is a branch of Science which deals with helping machines
finding solutions to complex problems in a more human-like fashion. This generally involves
borrowing characteristics from human intelligence, and applying them as algorithms in a
computer friendly way. A more or less flexible or efficient approach can be taken depending
on the requirements established, which influences how artificial the intelligent behaviour
appears. AI is generally associated with Computer Science, but it has many important links
with other fields such as Maths, Psychology, Cognition, Biology and Philosophy, among many
others. Our ability to combine knowledge from all these fields will ultimately benefit our
progress in the quest of creating an intelligent artificial being.
AI currently encompasses a huge variety of subfields, from general-purpose areas such
as perception and logical reasoning, to specific tasks such as playing chess, proving
mathematical theorems, writing poetry, and diagnosing diseases. Often, scientists in other fields
move gradually into artificial intelligence, where they find the tools and vocabulary to
systematize and automate the intellectual tasks on which they have been working all their lives.
Similarly, workers in AI can choose to apply their methods to any area of human intellectual
endeavour. In this sense, it is truly a universal field.

HISTORY OF AI
The origin of artificial intelligence lies in the earliest days of machine computations.
During the 1940s and 1950s, AI begins to grow with the emergence of the modem computer.
Among the first researchers to attempt to build intelligent programs were Newell and Simon.
Their first well known program, logic theorist, was a program that proved statements using the
accepted rules of logic and a problem-solving program of their own design. By the late fifties,
programs existed that could do a passable job of translating technical documents and it was seen
as only a matter of extra databases and more computing power to apply the techniques to less
formal, more ambiguous texts. Most problem-solving work revolved around the work of
Newell, Shaw and Simon, on the general problem solver (GPS). Unfortunately, the GPS did
not fulfil its promise and did not because of some simple lack of computing capacity. In the
1970's the most important concept of AI was developed known as Expert System which exhibits
as a set rules the knowledge of an expert. The application area of expert system is very large.
The 1980's saw the development of neural networks as a method learning examples.

Prof. Peter Jackson (University of Edinburgh) classified the history of AI into three
periods as:
1. Classical
2. Romantic
3. Modem
Artificial Intelligence 2

1. Classical Period:
It was started from 1950. In 1956, the concept of Artificial Intelligence came into existence.
During this period, the main research work carried out includes game plying, theorem proving
and concept of state space approach for solving a problem.

2. Romantic Period:
It was started from the mid 1960 and continues until the mid 1970. During this period people
were interested in making machine understand, that is usually mean the understanding of
natural language. During this period the knowledge representation technique "semantic net"
was developed.

3. Modern Period:
It was started from 1970 and continues to the present day. This period was developed to solve
more complex problems. This period includes the research on both theories and practical
aspects of Artificial Intelligence. This period includes the birth of concepts like Expert system,
Artificial Neurons, Pattern Recognition etc. The research of the various advanced concepts of
Pattern Recognition and Neural Network are still going on.

Components of AI
There are three types of components in AI

1) Hardware Components of AI
a) Pattern Matching
b) Logic Representation
c) Symbolic Processing
d) Numeric Processing
e) Problem Solving
f) Heuristic Search
g) Natural Language processing
h) Knowledge Representation
i) Expert System
j) Neural Network
k) Leaming
1) Planning
m) Semantic Network

2) Software Components
a) Machine Language
b) Assembly language
c) High level Language
d) LISP Language
e) Fourth generation Language
f) Object Oriented Language
g) Distributed Language
h) Natural Language
i) Particular Problem Solving Language
Artificial Intelligence 3

3) Architectural Components
a) Uniprocessor
b) Multiprocessor
c) Special Purpose Processor
d) Array Processor
e) Vector Processor
f) Parallel Processor
g) Distributed Processor

Definition of Artificial intelligence


1. AI is the study of how to make computers do things which at the moment people do better.
This is ephemeral as it refers to the current state of computer science and it excludes a major
area; problems that cannot be solved well either by computers or by people at the moment.

2. AI is a field of study that encompasses computational techniques for performing tasks that
apparently require intelligence when performed by humans.

3. AI is the branch of computer science that is concerned with the automation of intelligent
behaviour. A I is based upon the principles of computer science namely data structures used in
knowledge representation, the algorithms needed to apply that knowledge and the languages
and programming techniques used in their implementation.

4. AI is the field of study that seeks to explain and emulate intelligent behaviour in terms of
computational processes.

5. AI is about generating representations and procedures that automatically or autonomously


solve problems heretofore solved by humans.

6. A I is the part of computer science concerned with designing intelligent computer systems,
that is, computer systems that exhibit the characteristics we associate with intelligence in
human behaviour such as understanding language, learning, reasoning and solving problems.

7. A I is the study of mental faculties through the use of computational models.

8. A I is the study of the computations that make it possible to perceive, reason, and act.

9. A I is the exciting new effort to make computers think machines with minds, in the full and
literal sense.

10. AI is concerned with developing computer systems that can store knowledge and
effectively use the knowledge to help solve problems and accomplish tasks. This brief
statement sounds a lot like one of the commonly accepted goals in the education of humans.
We want students to learn (gain knowledge) and to learn to use this knowledge to help solve
problems and accomplish tasks.
Artificial Intelligence 4

STRONG AND WEAK AI


There are two conceptual thoughts about AI namely the Weak AI and Strong AI. The
strong AI is very much promising about the fact that the machine is almost capable of solve a
complex problem like an intelligent man. They claim that a computer is much more efficient
to solve the problems than some of the human experts. According to strong AI, the computer
is not merely a tool in the study of mind, rather the appropriately programmed computer is
really a mind. Strong AI is the supposition that some forms of artificial intelligence can truly
reason and solve problems. The term strong AI was originally coined by John Searle.
In contrast, the weak AI is not so enthusiastic about the outcomes of AI and it simply
says that some thinking like features can be added to computers to make them more useful
tools. It says that computers to make them more useful tools. It says that computers cannot be
made intelligent equal to human being, unless constructed significantly differently. They claim
that computers may be similar to human experts but not equal in any cases. Generally weak AI
refers to the use of software to study or accomplish specific problem solving that do not
encompass the full range of human cognitive abilities. An example of weak AI would be a
chess program. Weak AI programs cannot be called "intelligent" because they cannot really
think.

THESTATEOFTHEART
The increasingly advanced technology provides researchers with new tools that are capable of
achieving important goals, and these tools are great starting points in and of themselves. Among
the achievements of recent years, the following are some specific domains:
• Machine learning;
• Reinforcement learning;
• Deep learning;
• Natural language processing.

Machine Learning (ML)


Machine learning is a subcategory of AI that often uses statistical techniques to give machines
the ability to absorb data without explicitly receiving instructions to do so. This process is
known as 'training' a 'model' using a learning 'algorithm', which progressively improves
performance on a specific activity. The successes achieved in this field have encouraged
researchers to push harder on the accelerator.
Reinforcement Learning (RL)
Reinforcement learning is an area of ML that is related to software agents that learn 'goal-
oriented' behavior by trying and making mistakes in environments that provide rewards in
response to the agents' actions (called 'Policy') toward achieving the objectives.
This field is perhaps the one that has most captured the attention of researchers in the last
decade.

Deep Learning
Also within ML, deep learning takes inspiration from the activity of neurons within the brain
to learn how to recognize complex patterns through learned data. This is thanks to the use of
algorithms, mainly statistical calculations. The word 'deep' refers to the large number of
Artificial Intelligence 5

levels of neurons that ML models simultaneously, which helps acquire rich representations of
data to obtain performance gains.
Natural Language Processing (NLP)
Natural language processing is the mechanism by which machines acquire the ability to
analyze, understand, and manipulate textual data. 2019 was a great year for NPL with Google
Al's BERT and Transformer, Allen Institute's ELMo, OpenAI's Transformer, Ruder and
Howard's ULMFit, and finally, Microsoft's MT-DNN. All of these have shown that pre- taught
language models can substantially improve performance on a wide variety of NLP tasks.

INTELLIGENT AGENTS
AGENTS
An AI system is composed of an agent and its environment. The agents act m their
environment. The environment may contain other agents.

An agent is anything that can perceive its environment through sensors and acts upon that
environment through effectors.
• A human agent has sensory organs such as eyes, ears, nose, tongue and skin parallel
to the sensors, and other organs such as hands, legs, mouth, for effectors.
• A robotic agent replaces cameras and infrared range finders for the sensors, and
various motors and actuators for effectors.
• A software agent has encoded bit strings as its programs and actions.

Figure 2.1 Agents interact with environments through sensors and effectors.

Agent Terminology
• Performance Measure of Agent - It is the criteria, which determines how
successful an agent is.
• Behavior of Agent - It is the action that agent performs after any given sequence of
percepts.
• Percept - It is agent's perceptual inputs at a given instance.
• Percept Sequence - It is the history of all that an agent has perceived till date.
• Agent Function - It is a map from the precept sequence to an action.
Artificial Intelligence 6

How Agents Should Act


A rational agent is one that does the right thing. Obviously, this is better than doing the
wrong thing, but what does it mean? As a first approximation, we will say that the right action
is the one that will cause the agent to be most successful. That leaves us with the problem of
deciding how and when to evaluate the agent's success.
We use the term performance measure for the how-the criteria that determine how
successful an agent is. Obviously, there is not one fixed measure suitable for all agents. We
could ask the agent for a subjective opinion of how happy it is with its own performance, but
some agents would be unable to answer, and others would delude themselves. (Human agents
in particular are notorious for "sour grapes"-saying they did not really want something after
they are unsuccessful at getting it.) Therefore, we will insist on an objective performance
measure imposed by some authority. In other words, we as outside observers establish a
standard of what it means to be successful in an environment and use it to measure the
performance of agents.
As an example, consider the case of an agent that is supposed to vacuum a dirty floor.
A plausible performance measure would be the amount of dirt cleaned up in a single eight-
hour shift. A more sophisticated performance measure would factor in the amount of electricity
consumed and the amount of noise generated as well. A third performance measure might give
highest marks to an agent that not only cleans the floor quietly and efficiently, but also finds
time to go windsurfing at the weekend.'
The when of evaluating performance is also important. If we measured how much dirt
the agent had cleaned up in the first hour of the day, we would be rewarding those agents that
start fast (even if they do little or no work later on), and punishing those that work consistently.
Thus, we want to measure performance over the long run, be it an eight-hour shift or a lifetime.
We need to be careful to distinguish between rationality and omniscience. An
omniscient agent knows the actual outcome of its actions, and can act accordingly; but
omniscience is; impossible in reality.

In summary, what is rational at any given time depends on four things:


• The performance measure that defines degree of success.
• Everything that the agent has perceived so far. We will call this complete perceptual
history the percept sequence.
• What the agent knows about the environment.
• The actions that the agent can perform.

This leads to a definition of an ideal rational agent: For each possible percept sequence, an
ideal rational agent should do whatever action is expected to maximize its performance
measure, on the basis of the evidence provided by the percept sequence and whatever built-in
knowledge the agent has.
Artificial Intelligence 7

THE CONCEPT OF RATIONALITY


Rationality is nothing but status of being reasonable, sensible, and having good sense of
judgment.
Rationality is concerned with expected actions and results depending upon what the agent has
perceived. Performing actions with the aim of obtaining useful information is an important
part of rationality.

What is Ideal Rational Agent?


An ideal rational agent is the one, which is capable of doing expected actions to maximize
its performance measure, on the basis of -
• Its percept sequence
• Its built-in knowledge base

Rationality of an agent depends on the following -


• The performance measures, which determine the degree of success.
• Agent's Percept Sequence till now.
• The agent's prior knowledge about the environment.
• The actions that the agent can carry out.
A rational agent always performs right action, where the right action means the action that
causes the agent to be most successful in the given percept sequence. The problem the agent
solves is characterized by Performance Measure, Environment, Actuators, and Sensors
(PEAS).

ENVIRONMENTS
In this section, we will see how to couple an agent to an environment. In all cases,
however, the nature of the connection between them is the same: actions are done by the agent
on the environment, which in tum provides percepts to the agent. First, we will describe the
different types of environments and how they affect the design of agents. Then we will describe
environment programs that can be used as testbeds for agent programs.

Properties of environments
Environments come in several flavors. The principal distinctions to be made are as
follows:

Accessible vs. inaccessible


If an agent's sensory apparatus gives it access to the complete state of the environment, then
we say that the environment is accessible to that agent. An environment is effectively
accessible if the sensors detect all aspects that are relevant to the choice of action. An
accessible environment is convenient because the agent need not maintain any internal state to
keep track of the world.

Deterministic vs. nondeterministic.


If the next state of the environment is completely determined by the current state and the
actions selected by the agents, then we say the environment is deterministic. In principle, an
agent need not worry about uncertainty in an accessible, deterministic environment. If the
Artificial Intelligence 8

environment is inaccessible, however, then it may appear to be nondeterministic. This is


particularly true if the environment is complex, making it hard to keep track of all the
inaccessible aspects. Thus, it is often better to think of an environment as deterministic or
nondeterministic/rom the point of view of the agent.

Episodic vs. nonepisodic


In an episodic environment, the agent's experience is divided into "episodes." Each episode
consists of the agent perceiving and then acting. The quality of its action depends just on the
episode itself, because subsequent episodes do not depend on what actions occur in previous
episodes. Episodic environments are much simpler because the agent does not need to think
ahead.

Static vs. dynamic


If the environment can change while an agent is deliberating, then we say the environment is
dynamic for that agent; otherwise it is static. Static environments are easy to deal with because
the agent need not keep looking at the world while it is deciding on an action, nor need it worry
about the passage of time. If the environment does not change with the passage of time but the
agent's performance score does, then we say the environment is semidynamic.

Discrete vs. continuous


If there are a limited number of distinct, clearly defined percepts and actions we say that the
environment is discrete. Chess is discrete-there are a fixed number of possible moves on each
tum. Taxi driving is continuous-the speed and location of the taxi and the other vehicles
sweep through a range of continuous values.
We will see that different environment types require somewhat different agent
programs to deal with them effectively. It will tum out, as you might expect, that the hardest
case is inaccessible, nonepisodic, dynamic, and continuous.

Environment programs
The generic environment program in Figure 2.14 illustrates the basic relationship
between agents and environments. In this book, we will find it convenient for many of the
examples and exercises to use an environment simulator that follows this program structure.
The simulator takes one or more agents as input and arranges to repeatedly give each agent the
right percepts and receive back an action. The simulator then updates the environment based
on the actions, and possibly other dynamic processes in the environment that are not considered
to be agents (rain, for example). The environment is therefore defined by the initial state and
the update function. Of course, an agent that works in a simulator ought also to work in a real
environment that provides the same kinds of percepts and accepts the same kinds of actions.
Artificial Intelligence 9

procedure RUN-ENYIRONMENT(state,UPDATE-FN,agents, termination)


inputs: state, the initial state of the environment

agents, a set of agents,


termination, a predicate to test "hen we are done

for each agent in agents do


PERCEPT[agent] +- GET-PERCEPT(agent,state)
end
for each agent in agents do

THE NATURE OF ENVIRONMENTS


Some programs operate in the entirely artificial environment confined to keyboard
input, database, computer file systems and character output on a screen.
In contrast, some software agents (software robots or softbots) exist in rich, unlimited
softbots domains. The simulator has a very detailed, complex environment. The software
agent needs to choose from a long array of actions in real time. A softbot designed to scan the
online preferences of the customer and show interesting items to the customer works in the
real as well as an artificial environment.
The most famous artificial environment is the Turing Test environment, in which
one real and other artificial agents are tested on equal ground. This is a very challenging
environment as it is highly difficult for a software agent to perform as well as a human.

Turing Test
• The success of an intelligent behavior of a system can be measured with Turing Test.
• Two persons and a machine to be evaluated participate in the test. Out of the two
persons, one plays the role of the tester. Each of them sits in different rooms. The
tester is unaware of who is machine and who is a human. He interrogates the questions
by typing and sending them to both intelligences, to which he receives typed
responses.
• This test aims at fooling the tester. If the tester fails to determine machine's response
from the human response, then the machine is said to be intelligent.

THE STRUCTURE OF AGENTS


Agent's structure can be viewed as -
• Agent= Architecture+ Agent Program
• Architecture = the machinery that an agent executes on.
• Agent Program = an implementation of an agent function.
Artificial Intelligence 10

So far, we have talked about agents by describing their behavior-the action that is
performed after any given sequence of percepts. Now, we will have to bite the bullet and talk
about how the insides work.
The job of AI is to design the agent program: a function that implements the agent
mapping from percepts to actions. We assume this program will run on some sort of computing
device, which we will call the architecture. Obviously, the program we choose has to be one
that the architecture will accept and run. The architecture might be a plain computer, or it
might include special-purpose hardware for certain tasks, such as processing camera images or
filtering audio input. It might also include software that provides a degree of insulation between
the raw computer and the agent program, so that we can program at a higher level. In general,
the architecture makes the percepts from the sensors available to the program, runs the program,
and feeds the program's action choices to the effectors as they are generated.
The relationship among agents, architectures, and programs can be summed up as
follows:
agent = architecture + program

Before we design an agent program, we must have a pretty good idea of the possible
percepts and actions, what goals or performance measure the agent is supposed to achieve, and
what sort of environment it will operate in. These come in a wide variety. Figure 2.3 shows the
basic elements for a selection of agent types.
It may come as a surprise to some readers that we include in our list of agent types
programs that seem to operate in the entirely artificial environment defined by keyboard input
and character output on a screen. "Surely," one might say, "this is not a real environment, is
it?" In fact, what matters is not the distinction between "real" and "artificial" environments, but
the complexity of the relationship among the behavior of the agent, the percept sequence
generated by the environment, and the goals that the agent is supposed to achieve. Some "real"
environments are actually quite simple. For example, a robot designed to inspect parts as they
come by on a conveyer belt can make use of a number of simplifying assumptions: that the
lighting is always just so, that the only thing on the conveyer belt will be parts of a certain kind,
and that there are only two actions-accept the part or mark it as a reject.
Artificial Intelligence 11

Agent Type Percepts Actions Goals Environment

Medical diagne5. Symptoms, Questions,tests, Healthy patient, Patient, hospital


system findings, patients treatments roiniroize costs
answers

Satellite image Pixels ofvarying Print a Correct Images from


analysis. y ceru intensity,color categorizationof categorization orbiting satellite
scene

Part-picking robot Pixels of varying Pick up parts and Place parts in Conveyor belt
intensity sort into bins correct bins with parts

Refinery controller Temperature, Open.close Maximize purity' Refinery


pressure readings valves; adjust yield, safety
temperature

Interactive English Typed words Print exercises, Maximize Set of students


tutor suggestions, student's score on
corrections test

Figure 23 Examples of agent types and their PAGE descriptions.

Agent Programs
Intelligent Agents will all have the same skeleton, namely, accepting percepts from an
environment and generating actions. The early versions of agent programs will have a very
simple form (Figure 2.4). Each will use some internal data structures that will be updated as
new percepts arrive. These data structures are operated on by the agent's decision-making
procedures to generate an action choice, which is then passed to the architecture to be executed.
There are two things to note about this skeleton program. First, even though we defined
the agent mapping as a function from percept sequences to actions, the agent program receives
only a single percept as its input. It is up to the agent to build up the percept sequence in
memory, if it so desires. In some environments, it is possible to be quite successful without
storing the percept sequence, and in complex domains, it is infeasible to store the complete
sequence.

function SKELETO -AGE T( percepttetnrns action


static: memory, the agent's memory of the world

memory - UPDATE-MEMORY(memory, percept)


action - CHOOSE-BEST-ACTION(memo,)')

Figure 2.4 A skeleton agent. On each invocation, the agent's memory is updated to reflect
the new percept, the best action is chosen, and the fact that the action was taken is also stored in
memory. The memory persists from one invocation to the next.

Second, the goal or performance measure is not part of the skeleton program. This is
because the performance measure is applied externally to judge the behavior of the agent, and
Artificial Intelligence 12

it is often possible to achieve high performance without explicit knowledge of the


performance measure (see, e.g., the square-root agent).

Example
At this point, it will be helpful to consider a particular environment, so that our
discussion can become more concrete. Mainly because of its familiarity, and because it
involves a broad range of skills, we will look at the job of designing an automated taxi driver.
We must first think about the percepts, actions, goals and environment for the taxi.
They are summarized in Figure 2.6 and discussed in tum.
Agent Type Percepts Actions Goals Environment

Taxi driver Cameras, Steer, accelerate, Safe, fast, legal, Roads, other
speedometer, GPS, brake, talk. to comfortable trip, traffic, pedestrians,
sonar, microphone passenger maximize profits customers

Figure 2.6 TI1e taxi driver agent type.

The taxi will need to know where it is, what else is on the road, and how fast it is gomg.
This information can be obtained from the percepts provided by one or more controllable TV
cameras, the speedometer, and odometer. To control the vehicle properly, especially on curves,
it should have an accelerometer; it will also need to know the mechanical state of the vehicle,
so it will need the usual array of engine and electrical system sensors. It might have instruments
that are not available to the average human driver: a satellite global positioning system (GPS)
to give it accurate position information with respect to an electronic map; or infrared or sonar
sensors to detect distances to other cars and obstacles. Finally, it will need a microphone or
keyboard for the passengers to tell it their destination.
The actions available to a taxi driver will be more or less the same ones available to a
human driver: control over the engine through the gas pedal and control over steering and
braking. In addition, it will need output to a screen or voice synthesizer to talk back to the
passengers, and perhaps some way to communicate with other vehicles.
What performance measure would we like our automated driver to aspire to? Desirable
qualities include getting to the correct destination; minimizing fuel consumption and wear and
tear; minimizing the trip time and/or cost; minimizing violations of traffic laws and
disturbances to other drivers; maximizing safety and passenger comfort; maximizing profits.
Obviously, some of these goals conflict, so there will be trade-offs involved.
Finally, were this a real project, we would need to decide what kind of driving
environment the taxi will face. Should it operate on local roads, or also on freeways? Will it
be in Southern California, where snow is seldom a problem, or in Alaska, where it seldom is
not? Will it always be driving on the right, or might we want it to be flexible enough to drive
on the left in case we want to operate taxis in Britain or Japan? Obviously, the more restricted
the environment, the easier the design problem.
Artificial Intelligence 13

We will consider four types of agent program:

• Simple reflex agents


• Model based reflex agents (Agents that keep track of the world)
• Goal-based agents
• Utility-based agents

1. Simple Reflex Agents


• They choose actions only based on the current percept.
• They are rational only if a correct decision is made only on the basis of current
precept.
• Their environment is completely observable.
Condition-Action Rule - It is a rule that maps a state (condition) to an action.

Sensors
Agent
C:
How is the world
like now?

> What actions I Condition-Action


LU need to do? Rule

Effectors

2. Model Based Reflex Agents


They use a model of the world to choose their actions. They maintain an internal state.
Model - knowledge about "how the things happen in the world".
Internal State - It is a representation of unobserved aspects of current state depending on
percept history.
Updating the state requires the information about -
• How the world evolves.
• How the agent's actions affect the world ---------------------------------
------ �
Sensors
Agent
C:

E
C: How world evolves

like now? What my actions do

C:
LU
What actions I Condition-Action
need todo? Rule

Effectors
Artificial Intelligence 14

3. Goal Based Agents


They choose their actions in order to achieve goals.
Goal-based approach is more flexible than reflex agent since the knowledge supporting a
decision is explicitly modeled, thereby allowing for modifications.
Goal - It is the description of desirable situations.

Sensors

What my actions do

4. Utility Based Agents


They choose actions based on a preference (utility) for each state.
Goals are inadequate when -
• There are conflicting goals, out of which only few can be achieved.
• Goals have some uncertainty of being achieved and you need to weigh likelihood of
success against the importance of a goal.
, ,
Agent
Sensors
State

How is the world like now?

What my actions do
What happens if I do action A

How happy I am by doing action A?

What actions I need to do?

Effectors

*******
Artificial Intelligence 15

UNIT- II
SOLVING PROBLEMS BY SEARCHING
In previous chapter, we saw that simple reflex agents are unable to plan ahead. They
are limited in what they can do because their actions are determined only by the current percept.
Furthermore, they have no knowledge of what their actions do nor of what they are trying to
achieve.
In this chapter, we describe one kind of goal-based agent called a problem-solving
agent. Problem-solving agents decide what to do by finding sequences of actions that lead to
desirable states. We discuss informally how the agent can formulate an appropriate view of the
problem it faces. The problem type that results from the formulation process will depend on
the knowledge available to the agent: principally, whether it knows the current state and the
outcomes of actions. We then define more precisely the elements that constitute a "problem"
and its "solution," and give several examples to illustrate these definitions. Given precise
definitions of problems, it is relatively straightforward to construct a search process for finding
solutions.

PROBLEM SOLVING AGENTS


Intelligent agents are supposed to act in such a way that the environment goes through
a sequence of states that maximizes the performance measure. In its full generality, this
specification is difficult to translate into a successful agent design. As we mentioned in previous
chapter, the task is somewhat simplified if the agent can adopt a goal and aim to satisfy it. Let
us first look at how and why an agent might do this.
Goal formulation, based on the current situation, is the first step in problem solving. As
well as formulating a goal, the agent may wish to decide on some other factors that affect the
desirability of different ways of achieving the goal.
Problem formulation is the process of deciding what actions and states to consider, and
follows goal formulation. We will discuss this process in more detail. For now, let us assume
that the agent will consider actions at the level of driving from one major town to another. The
states it will consider therefore correspond to being in a particular town.
In general, then, an agent with several immediate options of unknown value can decide
what to do by first examining; different possible sequences of actions that lead to states of
known value, and then choosing the best one. This process of looking for such a sequence is
called search.
A search algorithm takes a problem as input and returns a solution in the form of an
action sequence. Once a solution is found, the actions it recommends can be carried out. This
is called the execution phase.
Thus, we have a simple "formulate, search, execute" design for the agent, as shown in
Figure 3.1. After formulating a goal and a problem to solve, the agent calls a search procedure
to solve it. It then uses the solution to guide its actions, doing whatever the solution recommends
as the next thing to do, and then removing that step from the sequence. Once the solution has
been executed, the agent will find a new goal.
Artificial Intelligence 16

EXAMPLE PROBLEMS
The range of task environments that can be characterized by well-defined problems is
vast. We can distinguish between so-called, toy problems, which are intended to illustrate or
exercise various problem-solving methods, and so-called real-world problems, which tend to
be more difficult and whose solutions people actually care about. In this section, we will give
examples of both. By nature, toy problems can be given a concise, exact description. This
means that they can be easily used by different researchers to compare the performance of
algorithms. Real-world problems, on the other hand, tend not to have a single agreed-upon
description, but we will attempt to give the general flavor of their formulations.

1. Toy Problems
The 8-puzz/e
The 8-puzzle, an instance of which is shown in Figure 3.4, consists of a 3x3 board with
eight numbered tiles and a blank space. A tile adjacent to the blank space can slide into the
space. The object is to reach the configuration shown on the right of the figure. One important
trick is to notice that rather than use operators such as "move the 3 tile into the blank space," it
is more sensible to have operators such as "the blank space changes places with the tile to its
left." This is because there are fewer of the latter kind of operator.
This leads us to the following formulation:
• States: a state description specifies the location of each of the eight tiles in one of the
nine squares. For efficiency, it is useful to include the location of the blank.
• Operators: blank moves left, right, up, or down.
• Goal test: state matches the goal configuration shown in Figure 3.4.
• Path cost: each step costs 1, so the path cost is just the length of the path.
The 8-puzzle belongs to the family of sliding-block puzzles. This general class is known to
be NP-complete, so one does not expect to find methods significantly better than the search
algorithms described in this chapter and the next. The 8-puzzle and its larger cousin, the 15-
puzzle, are the standard test problems for new search algorithms in Al.
'
;
. ... \

', 3 :
r r,-',

·:
:: 4
.;
.
r
•• I • "'
, " I. . ;,; • ,,_- •

MI/J
1 7 6 :1: 5 ::
(!. _{ ,,,. .. : •, ..., .. '. : ...:.. 'i ' .. !':! ., :, ·- --- .. : "/

Start State- Goal State

Figure 3.4 A typical instance of the 8-puzzle.

The 8-queens problem


The eight queens puzzle is the problem of placing eight chess queens on an 8x8 chessboard so
that no two queens threaten each other; thus, a solution requires that no two queens share the
same row, column, or diagonal.
Artificial Intelligence 17

The 8-queens problem can be defined as follows: Place 8 queens on an (8 by 8) chess board
such that none of the queens attacks any of the others. A configuration of 8 queens on the board
is shown in figure 1, but this does not represent a solution as the queen in the first column is on
the same diagonal as the queen in the last column.
)t
)t
-
)t
)t
)t
)t
)t
)t
Figure 1: Almost a solution of the 8-queens problem

Although efficient special-purpose algorithms exist for this problem and the whole n
queens family, it remains an interesting test problem for search algorithms. There are two main
kinds of formulation. The incremental formulation involves placing queens one by one,
whereas the complete-state formulation starts with all 8 queens on the board and moves them
around. In either case, the path cost is of no interest because only the final state counts;
algorithms are thus compared only on search cost. Thus, we have the following goal test and
path cost:
• Goal test: 8 queens on board, none attacked.
• Path cost: zero.
There are also different possible states and operators. Consider the following simple-minded
formulation:
• States: any arrangement of O to 8 queens on board.
• Operators: add a queen to any square.
In this formulation, we have 648 possible sequences to investigate. A more sensible choice
would use the fact that placing a queen where it is already attacked cannot work, because
subsequent placings of other queens will not undo the attack. So we might try the following:
• States: arrangements of O to 8 queens with none attacked.
• Operators: place a queen in the left-most empty column such that it is not attacked by
any other queen.
It is easy to see that the actions given can generate only states with no attacks; but
sometimes no actions will be possible. For example, after making the first seven choices (left-
to-right) in Figure 1, there is no action available in this formulation. The search process must
try another choice. A quick calculation shows that there are only 2057 possible sequences to
investigate. The right formulation makes a big difference to the size of the search space. Similar
considerations apply for a complete-state formulation. For example, we could set the problem
up as follows:
• States: arrangements of 8 queens, one in each column.
• Operators: move any attacked queen to another square in the same column.
This formulation would allow the algorithm to find a solution eventually, but it would be better
to move to an unattacked square if possible.
Artificial Intelligence 18

2. Real-world problems
Route finding
We have already seen how route finding is defined in terms of specified locations and
transitions! along links between them. Route-finding algorithms are used in a variety of
applications, such! as routing in computer networks, automated travel advisory systems, and
airline travel planning! systems. The last application is somewhat more complicated, because
airline travel has a very complex path cost, in terms of money, seat quality, time of day, type
of airplane, frequent-flyer mileage awards, and so on. Furthermore, the actions in the problem
do not have completely known outcomes: flights can be late or overbooked, connections can
be missed, and fog or emergency maintenance can cause delays.

Touring and travelling salesperson problems


Consider the problem, "Visit every city in Figure 3.3 at least once, starting and ending
in Bucharest." This seems very similar to route finding, because the operators still correspond
to trips between adjacent cities. But for this problem, the state space must record more
information. In addition to the agent's location, each state must keep track of the set of cities
the agent has visited. So the initial state would be "In Bucharest; visited {Bucharest}," a typical
intermediate state would be "In Vaslui; visited {Bucharest,Urziceni,Vaslui}," and the goal test
would check if the agent is in Bucharest and that all 20 cities have been visited. The travelling
salesperson problem (TSP) is a famous touring problem in which each city must be visited
exactly once. The aim is to find the shortest tour.The problem is NP-hard (Karp, 1972), but an
enormous amount of effort has been expended to improve the capabilities of TSP algorithms.
In addition to planning trips for travelling salespersons, these algorithms have been used for
tasks such as planning movements of automatic circuit board drills.

VLSI Layout
The design of silicon chips is one of the most complex engineering design tasks
currently undertaken, and we can give only a brief sketch here. A typical VLSI chip can have
as many as a million gates, and the positioning and connections of every gate are crucial to the
successful operation of the chip. Computer-aided design tools are used in every phase of the
process. Two of the most difficult tasks are cell layout and channel routing. These come after
the components and connections of the circuit have been fixed; the purpose is to lay out the
circuit on the chip so as to minimize area and connection lengths, thereby maximizing speed.
In cell layout, the primitive components of the circuit are grouped into cells, each of which
performs some recognized function. Each cell has a fixed footprint (size and shape) and
requires a certain number of connections to each of the other cells. The aim is to place the cells
on the chip so that they do not overlap and so that there is room for the connecting wires to be
placed between the cells. Channel routing finds a specific route for each wire using the gaps
between the cells. These search problems are extremely complex, but definitely worth solving.

Robot navigation
Robot navigation is a generalization of the route-finding problem described earlier.
Rather than a discrete set of routes, a robot can move in a continuous space with (in principle)
an infinite set of possible actions and states. For a simple, circular robot moving on a flat
Artificial Intelligence 19

surface, the space is essentially two-dimensional. When the robot has arms and legs that must
also be controlled, the search space becomes many-dimensional. Advanced techniques are
required just to make the search space finite.

SEARCHING FOR SOLUTIONS


We have seen how to define a problem, and how to recogmze a solution. The
remaining part- finding a solution-is done by a search through the state space. The idea is
to maintain and extend a set of partial solution sequences. In this section, we show how to
generate these sequences and how to keep track of them using suitable data structures.

Generating action sequences


To solve the route-finding problem from Arad to Bucharest, for example, we start off
with just the initial state, Arad. The first step is to test if this is a goal state. Clearly it is not,
but it is important to check so that we can solve trick problems like "starting in Arad, get to
Arad." Because this is ; not a goal state, we need to consider some other states. This is done by
applying the operators; to the current state, thereby generating a new set of states. The process
is called expanding the state. In this case, we get three new states, "in Sibiu," "in Timisoara,"
and "in Zerind," because there is a direct one-step route from Arad to these three cities. If there
were only one possibility; we would just take it and continue. But whenever there are multiple
possibilities, we must make a choice about which one to consider further.
This is the essence of search-choosing one option and putting the others aside for later,
in ' case the first choice does not lead to a solution. Suppose we choose Zerind. We check to
see if it is a goal state (it is not), and then expand it to get "in Arad" and "in Oradea." We can
then choose any of these two, or go back and choose Sibiu or Timisoara. We continue
choosing, testing, and expanding until a solution is found, or until there are no more states to
be expanded. The choice of which state to expand first is determined by the search strategy.
It is helpful to think of the search process as building up a search tree that is
superimposed over the state space. The root of the search tree is a search node corresponding
to the initial state. The leaf nodes of the tree correspond to states that do not have successors
in the tree, either because they have not been expanded yet, or because they were expanded,
but generated the empty set. At each step, the search algorithm chooses one leaf node to expand.
Figure 3.8 shows some of the expansions in the search tree for route finding from Arad to
Bucharest. The general search algorithm is described informally in Figure 3.9.
It is important to distinguish between the state space and the search tree. For the route
finding problem, there are only 20 states in the state space, one for each city. But there are an
infinite number of paths in this state space, so the search tree has an infinite number of nodes.
For example, in Figure 3.8, the branch Arad-Sibiu-Arad continues Arad-Sibiu-Arad-Sibiu-
Arad, and so on, indefinitely. Obviously, a good search algorithm avoids following such paths.

Data structures for search trees


There are many ways to represent nodes, but in this chapter, we will assume a node is
a data structure with five components:
Artificial Intelligence 20

• the state in the state space to which the node corresponds;


• the node in the search tree that generated this node (this is called the parent
node);
• the operator that was applied to generate the node;
• the number of nodes on the path from the root to this node (the depth of the
node);
• the path cost of the path from the initial state to the node.

The node data type is thus:


datatype node
components: STATE, PARENT-NODE, OPERATOR, DEPTH, PATH-COST

It is important to remember the distinction between nodes and states. A node is a


bookkeeping data structure used to represent the search tree for a particular problem instance
as generated by a particular algorithm. A state represents a configuration (or set of
configurations) of the world. Thus, nodes have depths and parents, whereas states do not.
(Furthermore, it is quite possible for two different nodes to contain the same state, if that state
is generated via two different sequences of actions.) The EXPAND function is responsible for
calculating each of the components of the nodes it generates.

SEARCH STRATEGIES
Search Algorithm Terminologies
Search:
Searching is a step by step procedure to solve a search-problem in a given search space.
A search problem can have three main factors:
1. Search Space: Search space represents a set of possible solutions, which a
system may have.
2. Start State: It is a state from where agent begins the search.
3. Goal test: It is a function which observe the current state and returns whether
the goal state is achieved or not.
• Search tree: A tree representation of search problem is called Search tree. The
root of the search tree is the root node which is corresponding to the initial state.
• Actions: It gives the description of all the available actions to the agent.
• Transition model: A description of what each action do, can be represented as a
transition model.
• Path Cost: It is a function which assigns a numeric cost to each path.
• Solution: It is an action sequence which leads from the start node to the goal
node.
• Optimal Solution: If a solution has the lowest cost among all solutions.
Artificial Intelligence 21

Properties of Search Algorithms:


Following are the four essential properties of search algorithms to compare the efficiency of
these algorithms:
Completeness: A search algorithm is said to be complete if it guarantees to return a solution if
at least any solution exists for any random input.
Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest
path cost) among all other solutions, then such a solution for is said to be an optimal solution.
Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.
Space Complexity: It is the maximum storage space required at any point during the search, as
the complexity of the problem.

Types of search algorithms


Based on the search problems we can classify the search algorithms into uninformed (Blind
search) search and informed search (Heuristic search) algorithms.
Search lgorithm

Uniformed/Blind Informed Search

Breadth first search Best First Search

Uniform cost search


A*search

Depth first search

Depth limited search

Iterative deeping depth


first search

Bidirectional search

1. UNINFORMED SEARCH (Blind Search):


The term means that they have no information about the number of steps or the path
cost from the current state to the goal-all they can do is distinguish a goal state from a nongoal
state. Uninformed search applies a way in which search tree is searched without any
information about the search space like initial state operators and test for the goal, so it is also
called blind search. It examines each node of the tree until it achieves the goal node.

It can be divided into five main types:


• Breadth-first search
• Uniform cost search
• Depth-first search
• Iterative deepening depth-first search
• Bidirectional Search
Artificial Intelligence 22

1. Breadth-first Search:
o Breadth-first search is the most common search strategy for traversing a tree or graph.
This algorithm searches breadthwise in a tree or graph, so it is called breadth-first
search.
o BFS algorithm starts searching from the root node of the tree and expands all successor
node at the current level before moving to nodes of next level.
o The breadth-first search algorithm is an example of a general-graph search algorithm.
o Breadth-first search implemented using FIFO queue data structure.

Advantages:
o BFS will provide a solution if any solution exists.
o If there are more than one solutions for a given problem, then BFS will provide the
minimal solution which requires the least number of steps.

Disadvantages:
o t requires lots of memory since each level of the tree must be saved into memory to
expand the next level.
o BFS needs lots of time if the solution is far away from the root node.

Example
In the below tree structure, we have shown the traversing of the tree using BFS algorithm
from the root node S to goal node K. BFS search algorithm traverse in layers, so it will follow
the path which is shown by the dotted arrow, and the traversed path will be:
S---> A--->B---->C--->D---->G--->H--->E---->F---->1 >K

Breadth Fir t Search

-----+• I vlo
>/

B ------4►►• Lev 11

...
C
/ D ►
/G II ► Lev 12

/\ • \
E F ... I
..
--•► Lcvel4

Time Complexity:
Time Complexity of BFS algorithm can be obtained by the number of nodes traversed in BFS
until the shallowest Node. Where the d= depth of shallowest solution and bis a node at every
state.
T (b) = 1+b2+b3+.......+ bd=0 (bd)
Artificial Intelligence 23

Space Complexity:
Space complexity of BFS algorithm is given by the Memory size of frontier which is O(bd).
Completeness:
BFS is complete, which means if the shallowest goal node is at some finite depth, then BFS
will find a solution.
Optimality:
BFS is optimal if path cost is a non-decreasing function of the depth of the node.

2. Depth-first Search
o Depth-first search isa recursive algorithm for traversing a tree or graph data structure.
o It is called the depth-first search because it starts from the root node and follows each
path to its greatest depth node before moving to the next path.
o DFS uses a stack data structure for its implementation.
o The process of the DFS algorithm is similar to the BFS algorithm.

Advantage:
o DFS requires very less memory as it only needs to store a stack of the nodes on the
path from root node to the current node.
o It takes less time to reach to the goal node than BFS algorithm (if it traverses in the
right path).

Disadvantage:
o There is the possibility that many states keep re-occurring, and there is no guarantee
of finding the solution.
o DFS algorithm goes for deep down searching and sometime it may go to the infinite
loop.

Example
In the below search tree, we have shown the flow of depth-first search, and it will follow the
order as:
Root node--->Left node ---- > right node.
Depth Fir t Search

Lev lo

1/ /
H

""
J
--+ Lev l t

. Level 2

\
K --+ Lev Is
Artificial Intelligence 24

It will start searching from root node S, and traverse A, then B, then D and E, after traversing
E, it will backtrack the tree as E has no other successor and still goal node is not found. After
backtracking it will traverse node C and then G, and here it will terminate as it found goal
node.

Completeness: DFS search algorithm is complete within finite state space as it will expand
every node within a limited search tree.
Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +......... + nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
Space Complexity: DFS algorithm needs to store only single path from the root node, hence
space complexity ofDFS is equivalent to the size of the fringe set, which is O(bm).
Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or
high cost to reach to the goal node.

3. Depth-Limited Search Algorithm:


A depth-limited search algorithm is similar to depth-first search with a predetermined limit.
Depth-limited search can solve the drawback of the infinite path in the Depth-first search. In
this algorithm, the node at the depth limit will treat as it has no successor nodes further.
Depth-limited search can be terminated with two Conditions of failure:
o Standard failure value: It indicates that problem does not have any solution.
o Cutoff failure value: It defines no solution for the problem within a given depth limit.

Advantages:
Depth-limited search is Memory efficient.

Disadvantages:
o Depth-limited search also has a disadvantage of incompleteness.
o It may not be optimal if the problem has more than one solution.
Example

Depth Limited Search

------+ Le el o

-... Le 11

. Level 2

I\ \ \
E F G H . Level S
Artificial Intelligence 25

Completeness: DLS search algorithm is complete if the solution is above the depth-limit.
Time Complexity: Time complexity of DLS algorithm is O(br).
Space Complexity: Space complexity of DLS algorithm is O(bxf).
Optimal: Depth-limited search can be viewed as a special case of DFS, and it is also not
optimal even if t>d.

4. Uniform-cost Search Algorithm:


Uniform-cost search is a searching algorithm used for traversing a weighted tree or graph. This
algorithm comes into play when a different cost is available for each edge. The primary goal of
the uniform-cost search is to find a path to the goal node which has the lowest cumulative cost.
Uniform-cost search expands nodes according to their path costs form the root node. It can be
used to solve any graph/tree where the optimal cost is in demand. A uniform-cost search
algorithm is implemented by the priority queue. It gives maximum priority to the lowest
cumulative cost. Uniform cost search is equivalent to BFS algorithm if the path cost of all edges
is the same.

Advantages:
o Uniform cost search is optimal because at every state the path with the least cost is
chosen.
0

Disadvantages:
o It does not care about the number of steps involve in searching and only concerned
about path cost. Due to which this algorithm may be stuck in an infinite loop.
Example
Uniform Cost S arch
.,,,,. LeYel o
/

( 1/ \4
B Lev l I
--...,
\

1/ ,'
D G LeYel 2

I
E F Lev 1 S
sl
G LeYel
Artificial Intelligence 26

Completeness:
Uniform-cost search is complete, such as if there is a solution, UCS will find it.
Time Complexity:
Let C* is Cost of the optimal solution, and t is each step to get closer to the goal node. Then
the number of steps is = C*Ii:::+1. Here we have taken +1, as we start from state O and end to C*/c.
Hence, the worst-case time complexity of Uniform-cost search isO(b1+ IC*lt:1)/.
Space Complexity:
The same logic is for space complexity so, the worst-case space complexity of Uniform-cost
search is O(b1+ IC*lt:l).
Optimal:
Uniform-cost search is always optimal as it only selects a path with the lowest path cost.

5. Iterative deepening depth-first Search:


■ The iterative deepening algorithm is a combination of DFS and BFS algorithms. This
search algorithm finds out the best depth limit and does it by gradually increasing the
limit until a goal is found.
■ This algorithm performs depth-first search up to a certain "depth limit", and it keeps
increasing the depth limit after each iteration until the goal node is found.
■ This Search algorithm combines the benefits of Breadth-first search's fast search and
depth-first search's memory efficiency.
■ The iterative search algorithm is useful uninformed search when search space is large,
and depth of goal node is unknown.

Advantages:
o It combines the benefits of BFS and DFS search algorithm in terms of fast search and
memory efficiency.

Disadvantages:
o The main drawback of IDDFS is that it repeats all the work of the previous phase.

Example
Following tree structure is showing the iterative deepening depth-first search.
IDDFS algorithm performs various iterations until it does not find the goal node.
The iteration performed by the algorithm is given as:
Iterative deepening depth first search

A -----+ Level o

B
I \ C -----+ Level 1

/\ /\
D E F G -----+ Level2

/\ \
H I K -----+ Level 3
Artificial Intelligence 27

1'st Iteration ---- > A


2'nd Iteration --- > A, B, C
3'rd Iteration ----- >A, B, D, E, C, F, G
4'th Iteration ----- >A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.

Completeness:
This algorithm is complete is if the branching factor is finite.
Time Complexity:
Let's suppose b is the branching factor and depth is d then the worst-case time complexity
is O(bd).
Space Complexity:
The space complexity of IDDFS will be O(bd).
Optimal:
IDDFS algorithm is optimal if path cost is a non- decreasing function of the depth of the
node.

6. Bidirectional Search Algorithm


Bidirectional search algorithm runs two simultaneous searches, one form initial state called as
forward-search and other from goal node called as backward-search, to find the goal node.
Bidirectional search replaces one single search graph with two small subgraphs in which one
starts the search from an initial vertex and other starts from goal vertex. The search stops when
these two graphs intersect each other.
Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.

Advantages:
o Bidirectional search is fast.
o Bidirectional search requires less memory

Disadvantages:
o Implementation of the bidirectional search tree is difficult.
o In bidirectional search, one should know the goal state in advance.

Example
In the below search tree, bidirectional search algorithm is applied. This algorithm divides one
graph/tree into two sub-graphs. It starts traversing from node 1 in the forward direction and
starts from goal node 16 in the backward direction.
The algorithm terminates at node 9 where two searches meet.
Artificial Intelligence 28

Bidirectional Search
Root node
................

)> 8

3
I
5/""
6

Goal node

Completeness: Bidirectional Search is complete if we use BFS in both searches.


Time Complexity: Time complexity of bidirectional search using BFS is O(bd).
Space Complexity: Space complexity of bidirectional search is O(bd).
Optimal: Bidirectional search is Optimal.

2. INFORMED SEARCH (Heuristic Search)


■ Informed search algorithms use domain knowledge.
■ In an informed search, problem information is available which can guide the search.
Informed search strategies can find a solution more efficiently than an uninformed
search strategy.
■ Informed search is also called a Heuristic search.
■ A heuristic is a way which might not always be guaranteed for best solutions but
guaranteed to find a good solution in reasonable time.

Heuristics function:
Heuristic is a function which is used in Informed Search, and it finds the most promising path.
It takes the current state of the agent as its input and produces the estimation of how close agent
is from the goal. The heuristic method, however, might not always give the best solution, but
it guaranteed to find a good solution in reasonable time. Heuristic function estimates how close
a state is to the goal. It is represented by h(n), and it calculates the cost of an optimal path
between the pair of states. The value of the heuristic function is always positive.

Admissibility of the heuristic function is given as:

1. h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost.

Hence heuristic cost should be less than or equal to the estimated cost.
Artificial Intelligence 29

Pure Heuristic Search:


Pure heuristic search is the simplest form of heuristic search algorithms. It expands
nodes based on their heuristic value h(n). It maintains two lists, OPEN and CLOSED list. In
the CLOSED list, it places those nodes which have already expanded and in the OPEN list, it
places nodes which have yet not been expanded.
On each iteration, each node n with the lowest heuristic value is expanded and generates
all its successors and n is placed to the closed list. The algorithm continues unit a goal state is
found.
In the informed search we will discuss two main algorithms which are given below:
• Best First Search Algorithm(Greedy search)
• A* Search Algorithm

1. Best-first Search Algorithm (Greedy Search):


Greedy best-first search algorithm always selects the path which appears best at that moment.
It is the combination of depth-first search and breadth-first search algorithms. It uses the
heuristic function and search. Best-first search allows us to take the advantages of both
algorithms. With the help of best-first search, at each step, we can choose the most promising
node. In the best first search algorithm, we expand the node which is closest to the goal node
and the closest cost is estimated by heuristic function, i.e.
f(n)= g(n).

Were, h(n)= estimated cost from node n to the goal.

The greedy best first algorithm is implemented by the priority queue.

Best first search algorithm


o Step 1: Place the starting node into the OPEN list.
o Step 2: If the OPEN list is empty, Stop and return failure.
o Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and
places it in the CLOSED list.
o Step 4: Expand the node n, and generate the successors of node n.
o Step 5: Check each successor of node n, and find whether any node is a goal node or
not. If any successor node is goal node, then return success and terminate the search,
else proceed to Step 6.
o Step 6: For each successor node, algorithm checks for evaluation function f(n), and then
check if the node has been in either OPEN or CLOSED list. If the node has not been in
both list, then add it to the OPEN list.
o Step 7: Return to Step 2.

Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both
the algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Artificial Intelligence 30

Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.
o It can get stuck in a loop as DFS.
o This algorithm is not optimal.

Example
Consider the below search problem, and we will traverse it using greedy best-first search. At
each iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in
the below table.

n d
1/ 2

B
12

B 4

1/
D 1/\
E F E

s/ 2/"' H
2
4

H G

In this search example, we are using two lists which are OPEN and CLOSED Lists.
Following are the iteration for traversing the above example.

12

Expand the nodes of S and put in the CLOSED list


Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]
: Open [E, A], Closed [S, B, F]
Artificial Intelligence 31

Iteration 3: Open [I, G, E, A], Closed [S, B, F]


: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F ---- > G
Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is O(bm).
Where, m is the maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
Optimal: Greedy best first search algorithm is not optimal.

2. A* Search Algorithm:
A* search is the most commonly known form of best-first search. It uses heuristic function
h(n), and cost to reach the node n from the start state g(n). It has combined features of UCS and
greedy best-first search, by which it solve the problem efficiently. A* search algorithm finds
the shortest path through the search space using the heuristic function. This search algorithm
expands less search tree and provides optimal result faster. A* algorithm is similar to UCS
except that it uses g(n)+h(n) instead of g(n).
In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we
can combine both costs as following, and this sum is called as a fitness number.

f(n} = g(n} + h(n)

Es ima ed cost Cost to reach


of the cheapest Cost to reach
from node n to
solution. node nfrom
goal node
start state.

Algorithm of A* search:
Stepl: Place the starting node in the OPEN list.
Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.
Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.
Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
Step 6: Return to Step 2.

Advantages:
o A* search algorithm is the best algorithm than other search algorithms.
o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Artificial Intelligence 32

Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and
approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in
the memory, so it is not practical for various large-scale problems.

Example
In this example, we will traverse the given graph using the A* algorithm. The heuristic value
of all states is given in the below table so we will calculate the f(n) of each state using the
formula f(n)= g(n) + h(n), where g(n) is the cost to reach any node from start state.
Here we will use OPEN and CLOSED list.

h(n)
B
y 5 5

D s
1/
C B
---:!: G 2

D 6

Solution:

G
2; \;-\
B

D
1/ l
D
\-
Initialization: {(S, 5)}
Iteration!: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 will give the final result, as S--->A--->C--->G it provides the optimal path with
cost 6.
Artificial Intelligence 33

If the heuristic function is admissible, then A* tree search will always find the least cost path.

Time Complexity: The time complexity of A* search algorithm depends on heuristic


function, and the number of nodes expanded is exponential to the depth of solution d. So the
time complexity is O(bAd), where b is the branching factor.

Space Complexity: The space complexity of A* search algorithm is O(b/\d)

*******
34

YouTube: Engineering Drive By: H. Ateeq Ahmed

www.android.previousquestionpapers.com / www.previousquestionpapers.com / www.ios.previousquestionpapers.com


UNIT 3

REINFORCEMENT LEARNING
Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in
an environment by performing the actions and seeing the results of actions. For each good action, the agent
gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.

In Reinforcement Learning, the agent learns automatically using feedbacks without any labeled data,
unlike supervised learning.

Since there is no labeled data, so the agent is bound to learn by its experience only.

RL solves a specific type of problem where decision making is sequential, and the goal is long-term, such
as game-playing, robotics, etc.

The agent interacts with the environment and explores it by itself. The primary goal of an agent in
reinforcement learning is to improve the performance by getting the maximum positive rewards.

The agent learns with the process of hit and trial, and based on the experience, it learns to perform the task in a
better way. Hence, we can say that "Reinforcement learning is a type of machine learning method where an
intelligent agent (computer program) interacts with the environment and learns to act within that." How a
Robotic dog learns the movement of his arms is an example of Reinforcement learning.

It is a core part of Artificial intelligence

a. , and all AI agent


b. works on the concept of reinforcement learning. Here we do not need to pre-program the
agent, as it learns from its own experience without any human intervention.

Example: Suppose there is an AI agent present within a maze environment, and his goal is to find the diamond.
The agent interacts with the environment by performing some actions, and based on those actions, the state of
the agent gets changed, and it also receives a reward or penalty as feedback.

The agent continues doing these three things (take action, change state/remain in the same state, and get
feedback), and by doing these actions, he learns and explores the environment.

The agent learns that what actions lead to positive feedback or rewards and what actions lead to negative
feedback penalty. As a positive reward, the agent gets a positive point, and as a penalty, it gets a negative
point.
Terms used in Reinforcement Learning

o Agent(): An entity that can perceive/explore the environment and act upon it.
o Environment(): A situation in which an agent is present or surrounded by. In RL, we assume the
stochastic environment, which means it is random in nature.
o Action(): Actions are the moves taken by an agent within the environment.
o State(): State is a situation returned by the environment after each action taken by the agent.
o Reward(): A feedback returned to the agent from the environment to evaluate the action of the agent.
o Policy(): Policy is a strategy applied by the agent for the next action based on the current state.
o Value(): It is expected long-term retuned with the discount factor and opposite to the short-term
reward.
o Q-value(): It is mostly similar to the value, but it takes one additional parameter as a current action (a).

Key Features of Reinforcement Learning

o In RL, the agent is not instructed about the environment and what actions need to be taken.
o It is based on the hit and trial process.
o The agent takes the next action and changes states according to the feedback of the previous action.
o The agent may get a delayed reward.
o The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive
rewards.

Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning in ML, which are:

1. Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value at a
state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π.
2. Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using the
value function. In this approach, the agent tries to apply such a policy that the action performed in each
step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:
oDeterministic: The same action is produced by the policy (π) at any state.
o Stochastic: In this policy, probability determines the produced action.
3. Model-based:

In the model-based approach, a virtual model is created for the environment, and the agent explores
that environment to learn it. There is no particular solution or algorithm for this approach because the
model representation is different for each environment.

Elements of Reinforcement Learning

There are four main elements of Reinforcement Learning, which are given below:

1. Policy
2. Reward Signal
3. Value Function
4. Model of the environment

1)Policy:

A policy can be defined as a way how an agent behaves at a given time. It maps the perceived states of the
environment to the actions taken on those states. A policy is the core element of the RL as it alone can define
the behavior of the agent. In some cases, it may be a simple function or a lookup table, whereas, for other
cases, it may involve general computation as a search process. It could be deterministic or a stochastic policy:

2) Reward Signal:

The goal of reinforcement learning is defined by the reward signal. At each state, the environment sends an
immediate signal to the learning agent, and this signal is known as a reward signal. These rewards are given
according to the good and bad actions taken by the agent. The agent's main objective is to maximize the total
number of rewards for good actions. The reward signal can change the policy, such as if an action selected by
the agent leads to low reward, then the policy may change to select other actions in the future.

3) Value Function: The value function gives information about how good the situation and action are and how
much reward an agent can expect. A reward indicates the immediate signal for each good and bad action,
whereas a value function specifies the good state and action for the future. The value function depends on the
reward as, without reward, there could be no value. The goal of estimating values is to achieve more rewards.

4) Model: The last element of reinforcement learning is the model, which mimics the behavior of the
environment. With the help of the model, one can make inferences about how the environment will behave.
Such as, if a state and an action are given, then a model can predict the next state and reward.

The model is used for planning, which means it provides a way to take a course of action by considering all
future situations before actually experiencing those situations. The approaches for solving the RL
problems with the help of the model are termed as the model-based approach. Comparatively, an
approach without using a model is called a model-free approach.
WORKING OF REINFORCEMENT LEARNING:
o Environment: It can be anything such as a room, maze, football ground, etc.
o Agent: An intelligent agent such as AI robot.

Let's take an example of a maze environment that the agent needs to explore. Consider the below
image:

In the above image, the agent is at the very first block of the maze. The maze is consisting of an
S6 block, which is a wall, S8 a fire pit, and S4 a diamond block.

The agent cannot cross the S6 block, as it is a solid wall. If the agent reaches the S4 block, then get
the +1 reward; if it reaches the fire pit, then gets -1 reward point. It can take four actions: move
up, move down, move left, and move right.

The agent can take any path to reach to the final point, but he needs to make it in possible fewer
steps. Suppose the agent considers the path S9-S5-S1-S2-S3, so he will get the +1-reward point.

The agent will try to remember the preceding steps that it has taken to reach the final step. To
memorize the steps, it assigns 1 value to each previous step. Consider the below step:

Now, the agent has successfully stored the previous steps assigning the 1 value to each previous
block. But what will the agent do if he starts moving from the block, which has 1 value block on both
sides? Consider the below diagram:
It will be a difficult condition for the agent whether he should go up or down as each block has the same value.
So, the above approach is not suitable for the agent to reach the destination. Hence to solve the problem, we
will use the Bellman equation, which is the main concept behind reinforcement learning.

The Bellman Equation


The Bellman equation was introduced by the Mathematician Richard Ernest Bellman in the year 1953, and
hence it is called as a Bellman equation. It is associated with dynamic programming and used to calculate the
values of a decision problem at a certain point by including the values of previous states.

It is a way of calculating the value functions in dynamic programming or environment that leads to modern
reinforcement learning.

The key-elements used in Bellman equations are:

o Action performed by the agent is referred to as "a"


o State occurred by performing the action is "s."
o The reward/feedback obtained for each good and bad action is "R."
o A discount factor is Gamma "γ."

The Bellman equation can be written as:

V(s) = max [R(s,a) + γV(s`)]

Where,

V(s)= value calculated at a particular point.

R(s,a) = Reward at a particular state s by performing an action.

γ = Discount factor

V(s`) = The value at the previous state.

In the above equation, we are taking the max of the complete values because the agent tries to find the
optimal solution always.
So now, using the Bellman equation, we will find value at each state of the given environment. We will start
from the block, which is next to the target block.

For 1st block:

V(s3) = max [R(s,a) + γV(s`)], here V(s')= 0 because there is no further state to move.

V(s3)= max[R(s,a)]=> V(s3)= max[1]=> V(s3)= 1.

For 2nd block:

V(s2) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 1, and R(s, a)= 0, because there is no reward at this state.

V(s2)= max[0.9(1)]=> V(s)= max[0.9]=> V(s2) =0.9

For 3rd block:

V(s1) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.9, and R(s, a)= 0, because there is no reward at this state
also.

V(s1)= max[0.9(0.9)]=> V(s3)= max[0.81]=> V(s1) =0.81

For 4th block:

V(s5) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.81, and R(s, a)= 0, because there is no reward at this
state also.

V(s5)= max[0.9(0.81)]=> V(s5)= max[0.81]=> V(s5) =0.73

For 5th block:

V(s9) = max [R(s,a) + γV(s`)], here γ= 0.9(lets), V(s')= 0.73, and R(s, a)= 0, because there is no reward at this
state also.

V(s9)= max[0.9(0.73)]=> V(s4)= max[0.81]=> V(s4) =0.66

Consider the below image:


Now, we will move further to the 6th block, and here agent may change the route because it always tries to find
the optimal path. So now, let's consider from the block next to the fire pit.

Now, the agent has three options to move; if he moves to the blue box, then he will feel a bump if he moves to
the fire pit, then he will get the -1 reward. But here we are taking only positive rewards, so for this, he will move
to upwards only. The complete block values will be calculated using this formula. Consider the below image:

Types of Reinforcement learning


There are mainly two types of reinforcement learning, which are:

o Positive Reinforcement
o Negative Reinforcement

Positive Reinforcement:

The positive reinforcement learning means adding something to increase the tendency that expected behavior
would occur again. It impacts positively on the behavior of the agent and increases the strength of the
behavior.

This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may
lead to an overload of states that can reduce the consequences.
Negative Reinforcement:

The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that
the specific behavior will occur again by avoiding the negative condition.

It can be more effective than the positive reinforcement depending on situation and behavior, but it provides
reinforcement only to meet minimum behavior.

Markov Decision Process

Markov Decision Process or MDP, is used to formalize the reinforcement learning problems. If the
environment is completely observable, then its dynamic can be modeled as a Markov Process. In MDP, the
agent constantly interacts with the environment and performs actions; at each action, the environment responds
and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using
MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

o A set of finite States S


o A set of finite Actions A
o Rewards received after transitioning from state S to state S', due to action a.
o Probability Pa.

MDP uses Markov property, and to better understand the MDP, we need to learn about it.

Markov Property:

It says that "If the agent is present in the current state S1, performs an action a1 and move to the state s2, then
the state transition from s1 to s2 only depends on the current state and future action and states do not depend
on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past action or
state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a Chess game, the players
only focus on the current state and do not need to remember past actions or states.
Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the
finite MDP.

Markov Process:

Markov Process is a memoryless process with a sequence of random states S1, S2, ..... , St that uses the Markov
Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition
function P. These two components (S and P) can define the dynamics of the system.

REINFORCEMENT LEARNING ALGORITHMS

Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used
algorithms are:

o Q-Learning:
o Q-learning is an Off policy RL algorithm, which is used for the temporal difference
Learning. The temporal difference learning methods are the way of comparing temporally
successive predictions.
o It learns the value function Q (S, a), which means how good to take action "a" at a particular
state "s."
o The below flowchart explains the working of Q- learning:

o State Action Reward State action (SARSA):


o SARSA stands for State Action Reward State action, which is an on-policy temporal
difference learning method. The on-policy control method selects the action for each state
while learning using a specific policy.
o The goal of SARSA is to calculate the Q π (s, a) for the selected current policy π and all
pairs of (s-a).
o The main difference between Q-learning and SARSA algorithms is that unlike Q-learning,
the maximum reward for the next state is not required for updating the Q-value in the
table.
o In SARSA, new action and reward are selected using the same policy, which has determined
the original action.
o The SARSA is named because it uses the quintuple Q(s, a, r, s', a'). Where,
s:originastate
a:Originalaction
r:rewardobservedwhilefollowingthestates
s' and a': New state, action pair.
o Deep Q Neural Network (DQN):
o As the name suggests, DQN is a Q-learning using Neural networks.
o For a big state space environment, it will be a challenging and complex task to define and
update a Q-table.
o To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table,
neural network approximates the Q-values for each action and state.

Now, we will expand the Q-learning.

Q-Learning Explanation:
o Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation.
o The main objective of Q-learning is to learn the policy which can inform the agent that what
actions should be taken for maximizing the reward under what circumstances.
o It is an off-policy RL that attempts to find the best action to take at a current state.
o The goal of the agent in Q-learning is to maximize the value of Q.
o The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation
given below:

In the equation, we have various components, including reward, discount factor (γ), probability, and end states
s'. But there is no any Q-value is given so first consider the below image:

In the above image, we can see there is an agent who has three values options, V(s 1), V(s2), V(s3). As this is
MDP, so agent only cares for the current state and the future state. The agent can go to any direction (Up, Left,
or Right), so he needs to decide where to go for the optimal path. Here agent will take a move as per
probability bases and changes the state. But if we want some exact moves, so for this, we need to make some
changes in terms of Q-value. Consider the below image:
Q- represents the quality of the actions at each state. So instead of using a value at each state, we will use a
pair of state and action, i.e., Q(s, a). Q-value specifies that which action is more lubricative than others, and
according to the best Q-value, the agent takes his next move. The Bellman equation can be used for deriving
the Q-value.

To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -
value equation will be:

Hence, we can say that, V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for quality in Q-learning, which means it specifies the quality of an action taken by the agent.

Q-table:

A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e.,
[s, a], and initializes the values to zero. After each action, the table is updated, and the q-values are stored
within the table.

The RL agent uses this Q-table as a reference table to select the best action based on the q-values.

Passive Reinforcement Learning


1. In this learning, the agent’s policy is fixed and the task is to learn the utilities of states.
2. It could also involve learning a model of the environment.
3. In passive learning, the agent’s policy is fixed (i.e.) in state s, it always executes the action
i. (s).

Its goal is simply to learn the utility function U (s).


4. For example: - Consider the 4 x 3 world.
5. The following figure shows the policy for that world.

+1

-1

• The following figure shows the corresponding utilities

0.812 0.868 0.918 +1

0.762 0.560 -1

0.705 0.655 0.611 0.388

• Clearly, the passive learning task is similar to the policy evaluation task.
• The main difference is that the passive learning agent does not know
o Neither the transition model T(s, a,s’), which specifies the probabilit y of reaching
state’s from state s after doing action a;
o Nor does it know the reward function R(s), which specifies the reward for each state.
• The agent executes a set of trials in the environment using its policy .
• In each trial, the agent starts in state (1,1) and experiences a sequence of state transitions until
it reaches one of the terminal states, (4,2) or (4,3).
• Its percepts supply both the current state and the reward received in that state.
• Typical trials might look like this:

(1 ,1)- (1, 2)- (1,3)-0.4 (1,2)- (1,3)- (2,3)-0.4 (3,3)- (4,3)+1


0.4 0.4 0.4 0.4 0.4
(1 ,1)- (1, 2)- (1,3)-0.4 (2,3)- (3,3)- (3,2)-0.4 (3,3)- (4,3)+1
0.4 0.4 0.4 0.4 0.4
(1 ,1)- (2, 1)- (3,1)-0.4 (3,2)- (4,2)-1
0.4 0.4 0.4

• Note that each state percept is subscripted with the rewardreceived.


• The object is to use the information about rewards to learn the expected utility U (s)
associated with each nonterminal state s.
• The utility is defined to be the expected sum of (discounted) rewards obtained if policy is
followed, the utility function is written a s
U (s) E tR(st ) | , s0 s

• For the 4 x 3 world set =1


Direct utility estimation:-

• A simple method for direct utility estimation is in the area of adaptive control theory by
Widrow and Hoff(1960).
• The idea is that the utility of a state is the expected total reward from that state onward, and
each trial provides a sample of this value for each state visited.
• Example:- The first trial in the set of three given earlier provides a sample total reward of
0.72 for state (1,1), two samples of 0.76 and 0.84 for (1,2), two samples of 0.80 and 0.88 for
(1,3) and so on.
• Thus at the end of each sequence, the algorithm calculates the observed reward- to-go for
each state and updates the estimated utility for that state accordingly.
• In the limit of infinitely many trails, the sample average will come together to the true
expectations in the utility function.
• It is clear that direct utility estimation is just an instance of supervised learning.
• This means that reinforcement learning have been reduced to a standard inductive learning
problem.

Advantage:-

• Direct utility estimation succeeds in reducing the reinforcement learning problem to an


inductive learning problem.
• Disadvantage:-
a. It misses a very important source of information, namely, the fact that the utilities of
states are not independent
i. Reason:- The utility of each state equals its own reward plus the expected utility
of its successor states. That-is, the utility values obey the Bellman equations for a
fixed policy
ii. U (s) R(s) T (s, (s), s`)U (s`)
b. It misses opportunities for learning
i. Reason:- It ignores the connections between states
c. The algorithm often converges very slowly.
i. Reason:- More broadly, direct utility estimation can be viewed as searching in a
hypothesis space for U that is much larger that it needs to be, in that it includes
many functions that violate the Bellman equations.

ADAPTIVE DYNAMIC PROGRAMMING:-


• Agent must learn how states are connected.
• Adaptive Dynamic Programming agent works by learning the transition model of the
environment as it goes along and solving the corresponding Markov Decision process using a
dynamic programming method.
• For passive learning agent, the transition model T (s, (s), s`) and the observed rewardsR(S)
into Bellman equation to calculate the utilities of the states.
• The process of learning the model itself is easy, because the environment is fully observable
i.e. we have a supervised learning task where the input is a state-action pair and the output is
the resulting state.
• We can also represent the transition model as a table of probabilities.
• The following algorithm shows the passive ADP agent,

Function PASSIVE-ADP-AGENT(percept) returns an action

Inputs: percept,a percept indicating the current state s’ and reward signal r’
Static: π a,fixed policy

Mdb,an MDP with model T,rewards R,discount γ


U,a table of utilities,initially empty
Ns,aa’ table of frequencies for state-action pairs,initially zero
N ,a table of frequencies for state-action-state triples,initially zero
sas

S,a,the previous state and action,initially null


If s’ is new then do U[s’]←r’ ; R[s’]←r’

If s is not null then do

Increment Nsa[s,a]andNsas’[s,a,s’]
For each t such that Nsas’[s,a,t]is nonzero do

T[s,a,t]←Nsas’[s,a,t]/Nsa[s,a]
U←VALUE-DETERMINATION(π,U,mdb)
If TERMINALS?[s’]then s,a←null else s,a←s’,π[s’]
return a

• Its performance on the 4 * 3 world is shown in the following figure.


• The following figure shows the root-mean square error in the estimate for U(1,1), averaged
over 20 runs of 100 trials each.

Advantages:-
It can converges quite quickly
Reason:- The model usually changes only slightly with each observation, the value iteration process can
use the previous utility estimates as initial values.
The process of learning the model itself is easy
Reason:- The environment is fully observable. This means that a supervised learning task exist where the
input is a state-action pair and the output is the resulting state.
It provides a standard against which other reinforcement learning algorithms can be measured.
Disadvantage:-
It is intractable for large state spaces

Temporal Difference Learning:-


• In order to approximate the constraint equation U (S ) , use the observed transitions to adjust
the values of the observed states, so that they agree with the constraint equation.
• When the transition occurs from S to S1 , we apply the following update to U (S )
U (S ) U (S ) ( R(S ) U (S1 ) U (S ))
• Where = learning rate parameter.
• The above equation is called Temporal difference or TD equation.
• The following algorithm shows the passive reinforcement learning agent usingtemporal
differences,

Function PASSIVE-TD-AGENT(precept)returns an action

Inputs:percept,a percept indicating the current state s’ and reward signal r’


Static:π,a fixed policy

U,a table of utilities,initially empty


Ns,a table of frequencies for states,initially zero
S,a,r,the previous state,action,and reward,initially null
If s’ is new then U[s’]←r’

If s is not null then do

Increment Ns[s]
U[s]←U[s] + α(Ns[s])(r + γU[s’] - U[s])
If TERMINAL?[s’]then s,a,r←null else s,a,r←s’,π[s’],r’

return a

• Advantages:-
o It is much simpler
o It requires much less computation perobservation
• Disadvantages:-
o It does not learn quite as fast as the ADP agent
o It shows much higher variability
• The following table shows the difference between ADP and TD approach,

ADP Approach TD Approach


ADP adjusts the state to agree with all of the TD adjusts a state to agree with its observed
successors that might occur, weighted by their successor
probabilities
ADP makes as many adjustments as it needs to TD makes a single adjustment per observed
restore consistency between the utility estimates transition
U and the environment model T
• The following points shows the relationship between ADP and TD approach,
o Both try to make local adjustments to the utility estimates in order to make each state
“agree” with its successors.
o Each adjustment made by ADP could be seen, from the TD point of view, as a result
of a “pseudo-experience” generated by simulating the current environment model.
o It is possible to extend the TD approach to use an environment model to generate several
“pseudo-experiences-transitions that the TD agent can imagine might happen, given its
current model.
o For each observed transition, the TD agent can generate a large number of imaginar y
transitions. In this way the resulting utility estimates will approximate more and more
closely those of ADP- of course, at the expense of increased computationtime.

Active Reinforcement learning:-

• A passive learning agent has a fixed policy that determines its behavior.
• “An active agent must decide what actions to do”
• An ADP agent can be taken an considered how it must be modified to handle this new
freedom.
• The following are the required modifications:-
o First the agent will need to learn a complete model with outcome probabilities for all
actions. The simple learning mechanism used by PASSIVE-ADP-AGENT will do just
fine for this.
o Next, take into account the fact that the agent has a choice of actions. The utilities it
needs to learn are those defined by the optimal policy.
U (s) R(s) max T(s, a, s`)U (s`)

o These equations can be solved to obtain the utility function U using he value iteration
or policy iteration algorithms.
o Having obtained a utility function U that is optimal for the learned model, the agent can
extract an optimal action by one-step look ahead to maximize the expected utility;
o Alternatively, if it uses policy iteration, the optimal policy is already available, so it
should simply execute the action the optimal policy recommends.

Exploration:-
• Greedy agent is an agent that executes an action recommended by the optimal policy for the
learned model.
• The following figure shows the suboptimal policy to which this agent converges inthis
particular sequence of trials.

+1

-1

• The agent does not learn the true utilities or the true optimal policy! what happens is that, in
the 39th trial, it finds a policy that reaches +1 reward along the lower route via (2,1),
(3,1),(3,2), and (3,3).

• After experimenting with minor variations from the 276th trial onward it sticks to that policy,
never learning the utilities of the other states and never finding the optimal route via
(1,2),(1.3) and(2,3).
• Choosing the optimal action cannot lead to suboptimal results.
• The fact is that the learned model is not the same as the true environment; what is optimal in
the learned model can therefore be suboptimal in the true environment.
• Unfortunately, the agent does not know what the true environment is, so it cannot compute
the optimal action for the true environment.
• Hence this can be done by the means of Exploitation.
• The greedy agent can overlook that actions do more than provide rewards according to the
current learned model; they also contribute to learning the true model by affecting the
percepts that are received.
• An agent therefore must make a trade-off between exploitation to maximize its reward and
exploration to maximize its long-term well being.

• Pure exploitation risks getting stuck in a rut.


• Pure exploitation to improve ones knowledge id of no use if one never puts that knowledge
into practice.

GLIE Scheme:-
• To come up with a reasonable scheme that will eventually lead to optimal behavior by the
agent a GLIE Scheme can be used.
• A GLIE Scheme must try each action in each state an unbounded number of times to avoid
having a finite probability that an optimal action is missed because of an unusually bad series
of outcomes.

• An ADP agent using such a scheme will eventually learn the true environment model.
• A GLIE Scheme must also eventually become greedy, so that the agents actions become
optimal with respect to the learned (and hence the true)model.
• There are several GLIE Scheme as follows,
o The agent can choose a random action a fraction 1/t of the time and to follow the
greedy policy otherwise.

Advantage:- This method eventually converges to an optimal policy


Disadvantage:- It can be extremely slow
o Another approach is to give some weight to actions that the agent has not tried very
often, while tending to avoid actions that are believed to be of low utility. This can be
implemented by altering the constraint equation, so that it assigns a higher utilit y
estimate to relatively UP explored state-action pairs.
• Essentially, this amounts to an optimistic prior over the possible environments and causes the
agent to behave initially as if there were wonderful rewards scattered all over the place.

Exploration function:-
• Let U+ denotes the optimistic estimate of the utility of the state s, and let N(a,s) be the
number of times action a has been tried in state s.
• Suppose that value iteration is used in an ADP learning agent; then rewrite the update
equation to incorporate the optimistic estimate.
• The following equation does this,
U (s) R(s) max f T (s, a, s`)U (s`), N(a, s)
a 
s`
• Here f(u ,n) is called the exploration function.
• It determines how greed is trade off against curiosity.

• The function f(u, n) should be increasing in u and decreasing in n.


• The simple definition is
f( u, n) = R+ in n<Nc
u otherwise
+
where R = optimistic estimate of the best possible reward obtainable in any state and Nc is a
fixed parameter.
• The fact that U+ rather than U appears on the right hand side of the above equation isvery
important.
• If U is used, the more pessimistic utility estimate, then the agent would soon become
unwilling to explore further a field.

• The use of U+ means that benefits of exploration are propagated back from the edges of
unexplored regions, so that actions that lead toward unexplored regions are weighted more
highly, rather than just actions that are themselves unfamiliar.

Learning an action value function:-


• To construct an active temporal difference learning agent, it needs a change in the passive TD
approach.
• The most obvious change that can be made in the passive case is that the agent is no longer
equipped with a fixed policy, so if it learns a utility function U, it will need to learn a model
in order to be able to choose an action based on U via one step look ahead.
• The update rule of passive TD remains unchanged. This might seem old.

• Reason:-
o Suppose the agent takes a step that normally leads to a good destination, but because
of non determinism in the environment the agent ends up in a disastrous state.
o The TD update rule will take this as seriously as if the outcome had been the normal
result of the action, where the agent should not worry about it too much since the
outcome was a fluke.
o It can be shown that the TD algorithm will converge to the same values as ADP as the
number of training sequences tends to infinity.

Generalization in Reinforcement Learning:-


• The utility function and Q-functions learned by the agents are represented in tabular form
with one output value for each input tuple.
• This approach works well for small setspaces.
• Example:- The game of chess where the state spaces are of the order 1050 states. Visiting all
the states to learn the game is tedious.
• One way to handle such problems is to use FUNCTION APPROXIMATION.
• Function approximation is nothing but using any sort of representation for the function
other than the table.
• For Example:- The evaluation function for chess is represented as a weighted linear function
of set of features or basic functions f1,….fn
U (S) 1 f1(S) 2 f2 (S ) nfn (S )
• The reinforcement learning can learn value for the parameters 1 ..... n .
• Such that the evaluation function U approximates the true utility function.
• As in all inductive learning, there is a tradeoff between the size of the hypothesis space and
the time it takes to learn thefunction.
• For reinforcement learning, it makes more sense to use an online learning algorithm that
updates the parameter after each trial.
• Suppose we run a trial and the total reward obtained starting at (1, 1) is 0.4.
• This suggests that U (1,1) , currently 0.8 is too large and must be reduced.
• The parameter should be adjusted to achieve this. This is done similar to neural network
learning where we have an error function which computes the gradient with respect to the
parameters.
• If Uj(S) is the observed total reward for state S onward in the jth trial then the error is defined
as half the squared difference of the predicted total and the actual total.
E j (S ) (U (S ) U j (S ))2 / 2

• The rate of change of error with respect to each parameter i i Ej / j , is to move the s
parameter in the direction of the decreasing error.
i i ( Ej (S ) / C j ) i (Uj (S ) U (S ))( U (S ) / i)

• This is called Widrow-Hoff Rule or Delta Rule.


Advantages:-
o It requires less space.
o Function approximation can also be very helpful for learning a model of the
environment.
o It allows for inductive generalization over input states.
Disadvantages:-
o The convergence is likely to be displayed.
o It could fail to be any function in the chosen hypothesis space that approximates the
true utility function sufficiently well.
o Consider the simplest case, which is direct utility estimation. With function
approximation, this is an instance of supervised learning.

APPLICATIONS OF REINFORCEMENT LEARNING

1. RL in Marketing

Marketing is all about promoting and then, selling the products or services either of your brand or someone
else’s. In the process of marketing, finding the right audience which yields larger returns on investment you
or your company is making is a challenge in itself.
And, it is one of the reasons companies are investing dollars in managing digitally various marketing
campaigns. Through real-time bidding supporting well the fundamental capabilities of RL, your and other
companies, smaller or larger, can expect: –
• more display ad impressions in real-time.
• increased ROI, profit margins.
• predicting the choices, reactions, and behavior of customers towards your products/services.
2. RL in Broadcast Journalism

Through different types of Reinforcement Learning, attracting likes and views along with tracking the
reader’s behavior is much simpler. Besides, recommending news that suits the frequently-changing
preferences of readers and other online users can possibly be achieved since journalists can now be equipped
with an RL-based system that keeps an eye on intuitive news content as well as the headlines. Take a look at
other advantages too which Reinforcement Learning is offering to readers all around the world.
• News producers are now able to receive the feedback of their users instantaneously.
• Increased communication, as users are more expressive now.
• No space for disinformation, hatred.

3. RL in Healthcare

Healthcare is an important part of our lives and through DTRs (a sequence-based use-case of RL), doctors
can discover the treatment type, appropriate doses of drugs, and timings for taking such doses. Curious to
know how is this possible!! See, DTRs are equipped with: –
• a sequence of rules which confirm the current health status of a patient.
• Then, they optimally propose treatments that can diagnose diseases like diabetes, HIV, Cancer, and
mental illness too.
If required, these DTRs (i.e. Dynamic Treatment Regimes) can reduce or remove the delayed impact of
treatments through their multi-objective healthcare optimization solutions.

4. RL in Robotics

Robotics without any doubt facilitates training a robot in such a way that a robot can perform tasks – just
like a human being can. But still, there is a bigger challenge the robotics industry is facing today – Robots
aren’t able to use common sense while making various moral, social decisions. Here, a combination of Deep
Learning and Reinforcement Learning i.e. Deep Reinforcement Learning comes to the rescue to enable the
robots with, “Learn How To Learn” model. With this, the robots can now: –
• manipulate their decisions by grasping well various objects visible to them.
• solve complicated tasks which even humans fail to do as robots now know what and how to learn from
different levels of abstractions of the types of datasets available to them.

5. RL in Gaming

Gaming is something nowadays without which you, me, or a huge chunk of people can’t live.
With games optimization through Reinforcement Learning algorithms, we may expect better
performances of our favorite games related to adventure, action, or mystery.
To prove it right, the Alpha Go example can be considered. This is a computer program that defeated the
strongest Go (a challenging classical game) Player in October 2015 and itself became the strongest Go
player. The trick of Alpha Go to defeat the player was Reinforcement Learning which kept on developing
stronger as the game is constantly exposed to unexpected gaming challenges. Like Alpha Go, there are many
other games available. Even you can also optimize your favorite games by applying appropriately prediction
models which learn how to win in even complex situations through RL-enabled strategies.
6. RL in Image Processing

Image Processing is another important method of enhancing the current version of an image to extract some
useful information from it. And there are some steps associated like:
• Capturing the image with machines like scanners.
• Analyzing and manipulating it.
• Using the output image obtained after analysis for representation, description-purposes.
Here, ML models like Deep Neural Networks (whose framework is Reinforcement Learning) can be
leveraged for simplifying this trending image processing method. With Deep Neural Networks, you can
either enhance the quality of a specific image or hide the info. of that image. Later, use it for any of your
computer vision tasks.

7. RL in Manufacturing

Manufacturing is all about producing goods that can satisfy our basic needs and essential wants. Cobot
Manufacturers (or Manufacturers of Collaborative Robots that can perform various manufacturing tasks
with a workforce of more than 100 people) are helping a lot of businesses with their own RL solutions for
packaging and quality testing. Undoubtedly, their use is making the process of manufacturing quality
products faster that can say a big no to negative customer feedback. And the lesser negative feedbacks are,
the better is the product’s performance and also, sales margin too.

POLICY SEARCH
Policy search is a subfield in reinforcement learning which focuses on finding good parameters for a given
policy parametrization. It is well suited for robotics as it can cope with high-dimensional state and action
spaces, one of the main challenges in robot learning.

1. Policy search methods are a family of systematic approaches for continuous (or large) actions and state
space.
2. With policy search, expert knowledge is easily embedded in initial policies (by demonstration,
imitation).
3. Policy search is more prefered than other RL methods in practical applications (e.g. robotics).
Policy gradient
• Family of randomized policy µ(s, a) = Pr(a|s) (deterministic policy is a special case).

• the performance measure is J(µ) = Eµ r0 + γr1 + γ 2 r2 + . . .

• Policy µθ(s, a) is parameterized by a parameter space θ ∈

The parametric performance measure becomes J(θ) = Eθ r0 + γr1 + γ 2 r2 + . . .

• Solution: gradient-descent algorithms. θk+1 = θk + α∇θJ(θk)

Policy gradient updates θµ0 = θµ + α∇θJ(θµ)

• Guarantee the performance improvement: J(θµ0 ) ≥ J(θµ) ⇒ µ 0 at least better than or equal to µ

Policy gradient: Black-box approaches


• Approximate the gradient using supervised learning (regression).

• Collect data D = {δθi , δJi} (the sampled gradients). By – perturbing the parameters: θ + δθ – applying the
new policy µ(θ + δθ) to get δJi = J(θ + δθ) − J(θ)

• the finite different (FD) gradient estimation by regression gFD(θ) = (∆Θ>∆Θ)−1∆Θ>∆J

• gradient update θ ← θ + αgFD(θ)


Policy gradient: Likelihood Ratio Gradient
• rewrite the performance measure

J(θ) J(θ) = Eθ r0 + γr1 + γ 2 r2 + . . . = Z p(ξ|µθ)R(ξ)dξ

where ξ = {s0, a0, r0, s1, a1, r1, . . .} is a trajectory.

R(ξ) = r0 + γr1 + γ 2 r2 + . . .

p(ξ|µθ) = p(s0) Yp(st+1|st, at)µθ(at|st)

• Gradient derivation

∇θJ(θ) = Z ∇θp(ξ|µθ)R(ξ)d

ξ = Z p(ξ|µθ)∇θ log p(ξ|µθ)R(ξ)dξ

(the trick is ∇f = f∇ log f)

= E h ∇θ log p(ξ|µθ)R(ξ) i

• Using Monte-Carlo simulations: sampling M tracjectories ξi from policy µθ.

∇θJ(θ) ≈ 1 M X M i=1 ∇θ log p(ξi |µθ)R(ξi)

= 1 M X M i=1 X Ti t=0 ∇θ log µθ(at|st)R(ξi)

because p(st+1|st, at) not depends on θ, so its gradient w.r.t θ is 0

A vanilla policy gradient algorithm


• initialize θ0

• for k = 0 : ∞ (until convergence)

=generate M tracjectories ξi from policy µθk

=compute ∇θJ(θk) from {ξi}M i=1

– update θ: θk+1 = θk + α∇θJ(θk)

• end for
Natural policy gradient
∇˜ θJ(θ) = G −1 (θ)∇θJ(θ)

where G(θ) is the Fisher Information Matrix (Amari, 1998

Natural Policy Gradient: Steepest descent


• The optimization (maxmize J(θ)) is over the space of trajectories, when considering p(ξ|µθ) is a function of
parameter ξ (instead of θ) of dimension |θ|.

• We try to define a Riemannian structure on the manifold of trajectories ξ

The optimization (maxmize J(θ)) is over the space of trajectories, when considering p(ξ|µθ) is a function of
parameter ξ (instead of θ) of dimension |θ|.

• We try to define a Riemannian structure on the manifold of trajectories ξ

• The steepest descent should minimize the J(θ + δθ) after update. This is formulated as an optimization
problem min J(θ + δθ) = J(θ) + δθ∇J(θ) subject to hδθ, δθip(ξ|µθ) =

Using Lagrange multiplier

L(θ, λ) = J(θ) + δθ∇J(θ) + λ( X i,j Gi,j δθi, δθj − )

• Taking derivative

θi, ∇J(θ) + λGi,j δθj = 0

• Therefore, δ = G −1∇J(θ)

The metric is the distances on probability spaces.

The KL-divergence between two distributions is a natural divergence on changes in a distribution.


G(θ) = hδi log p(ξ|µθ), δj log p(ξ|µθ)ip(ξ|µθ)
• Estimate the Fisher information matrix using sampled trajectories.

G(θ) ≈ 1 M X M i=1 [∇θ log p(ξi |µθ)R(ξi)] × [∇θ log p(ξi |µθ)R(ξi)]>

= 1 M X M i=1 [ X Ti t=0 ∇θ log µθ(at|st)R(ξi)] × [ X Ti t=0 ∇θ log µθ(at|st)R(ξi)]

Actor-Critic methods

1. The policy structure is known as the actor, (→ tuned by gradient updates.)


2. The estimated value function is known as the critic (→ tuned by TD error).

Policy Gradient Theorem Algorithm


• R(ξ) can be estimated by Qπ t (st, at)
∇θJ(θ) = E hX T t=0 ∇θ log µθ(at|st) X T j=t rj i
= E hX T t=0 ∇θ log µθ(at|st)Q π t (st, at)

Policy gradient with function approximation


Q π t (st, at) = φ(st, at) > · w
• Compatible function approximation: how to choose φ(st, at)? such that
– does not introduce bias
– reduce the variance
ARTIFICIAL INTELLIGENCE

UNIT-IV

SYLLABUS:

Natural Language for Communication: Phrase structure


grammars, Syntactic Analysis, Augmented Grammars and
semantic Interpretation, Machine Translation, Speech
Recognition.

Perception: Image Formation, Early Image Processing


Operations, Object Recognition by appearance, Reconstructing
the 3D World, Object Recognition from Structural information,
Using Vision.
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY ANANTAPUR
B.Tech (CSE)– III-I Sem
ARTIFICIAL INTELLIGENCE
(19A05502T) ARTIFICIAL INTELLIGENCE
(Common to CSE & IT)

Unit-IV: Natural Language for Communication: Phrase structure grammars, Syntactic Analysis, Augmented
Grammars and semantic Interpretation, Machine Translation, Speech Recognition
Perception: Image Formation, Early Image Processing Operations, Object Recognition by appearance,
Reconstructing the 3D World, Object Recognition from Structural information, Using Vision.

Computers don’t speak languages the way humans do. They communicate in machine code or machine
language, while we speak English, Dutch, French or some other human language. Most of us don’t
understand the millions of zeros and ones computers communicate in. And in turn, computers don’t
understand human language unless they are programmed to do so. That’s where natural language
processing (NLP) comes in.

What is natural language processing?


Natural language processing is a form of artificial intelligence (AI) that gives computers the ability to read,
understand and interpret human language. It helps computers measure sentiment and determine which
parts of human language are important. For computers, this is an extremely difficult thing to do because of
the large amount of unstructured data, the lack of formal rules and the absence of real-world context or
intent.

Language is a method of communication with the help of which we can speak, read and write. Natural
Language Processing (NLP) is a subfield of Computer Science that deals with Artificial Intelligence (AI), which
enables computers to understand and process human language.

Study of Human Languages


Language is a crucial component for human lives and also the most fundamental aspect of our behavior.
We can experience it in mainly two forms - written and spoken. In the written form, it is a way to pass our
knowledge from one generation to the next. In the spoken form, it is the primary medium for human beings
to coordinate with each other in their day-to-day behavior. Language is studied in various academic
disciplines. Each discipline comes with its own set of problems and a set of solution to address those.

Consider the following table to understand this −

Discipline Problems Tools

Linguists How phrases and sentences can be Intuitions about well-formedness and meaning.
formed with words? Mathematical model of structure. For example,

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 1


What curbs the possible meaning model theoretic semantics, formal language
theory.
for a sentence?

Psycholinguists How human beings can identify the Experimental techniques mainly for measuring
structure of sentences? the performance of human beings.
How the meaning of words can be Statistical analysis of observations.
identified?
When does understanding take
place?

Philosophers How do words and sentences Natural language argumentation by using


acquire the meaning? intuition.
How the objects are identified by Mathematical models like logic and model
the words? theory.
What is meaning?

Computational How can we identify the structure Algorithms


Linguists of a sentence Data structures
How knowledge and reasoning can Formal models of representation and reasoning.
be modeled?
AI techniques like search & representation
How we can use language to methods.
accomplish specific tasks?

Ambiguity and Uncertainty in Language


Ambiguity, generally used in natural language processing, can be referred as the ability of being understood
in more than one way. In simple terms, we can say that ambiguity is the capability of being understood in
more than one way. Natural language is very ambiguous.

Components of Language
The language of study is divided into the interrelated components, which are conventional as well as
arbitrary divisions of linguistic investigation. The explanation of these components is as follows −

Phonology
The very first component of language is phonology. It is the study of the speech sounds of a particular
language. The origin of the word can be traced to Greek language, where ‘phone’ means sound or voice.
Phonetics, a subdivision of phonology is the study of the speech sounds of human language from the
perspective of their production, perception or their physical properties. IPA (International Phonetic
Alphabet) is a tool that represents human sounds in a regular way while studying phonology. In IPA, every
written symbol represents one and only one speech sound and vice-versa.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 2


Phonemes
It may be defined as one of the units of sound that differentiate one word from other in a language. In
linguistic, phonemes are written between slashes. For example, phoneme /k/ occurs in the words such as
kit, skit.

Morphology
It is the second component of language. It is the study of the structure and classification of the words in a
particular language. The origin of the word is from Greek language, where the word ‘morphe’ means ‘form’.
Morphology considers the principles of formation of words in a language. In other words, how sounds
combine into meaningful units like prefixes, suffixes and roots. It also considers how words can be grouped
into parts of speech.

Lexeme
In linguistics, the abstract unit of morphological analysis that corresponds to a set of forms taken by a single
word is called lexeme. The way in which a lexeme is used in a sentence is determined by its grammatical
category. Lexeme can be individual word or multiword. For example, the word talk is an example of an
individual word lexeme, which may have many grammatical variants like talks, talked and talking. Multiword
lexeme can be made up of more than one orthographic word. For example, speak up, pull through, etc. are
the examples of multiword lexemes.

Syntax
It is the third component of language. It is the study of the order and arrangement of the words into larger
units. The word can be traced to Greek language, where the word suntassein means ‘to put in order’. It
studies the type of sentences and their structure, of clauses, of phrases.

Semantics
It is the fourth component of language. It is the study of how meaning is conveyed. The meaning can be
related to the outside world or can be related to the grammar of the sentence. The word can be traced to
Greek language, where the word semainein means means ‘to signify’, ‘show’, ‘signal’.

Pragmatics:
It is the fifth component of language. It is the study of the functions of the language and its use in context.
The origin of the word can be traced to Greek language where the word ‘pragma’ means ‘deed’, ‘affair’

Spoken Language Syntax


The written English and spoken English grammar have many common features but along with that, they
also differ in a number of aspects. The following features distinguish between the spoken and written
English grammar −

Disfluencies and Repair


This striking feature makes spoken and written English grammar different from each other. It is individually
known as phenomena of disfluencies and collectively as phenomena of repair. Disfluencies include the use
of following −
PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 3
• Fillers words − Sometimes in between the sentence, we use some filler words. They are called fillers
of filler pause. Examples of such words are uh and um.

• Reparandum and repair − The repeated segment of words in between the sentence is called
reparandum. In the same segment, the changed word is called repair. Consider the following
example to understand this −

Does ABC airlines offer any one-way flights uh one-way fares for 5000 rupees?
In the above sentence, one-way flight is a reparadum and one-way flights is a repair.
Word Fragments
Sometimes we speak the sentences with smaller fragments of words. For example, wwha-what is the
time? Here the words w-wha are word fragments.

NATURAL LANGUAGE GRAMMAR


For linguistics, language is a group of arbitrary vocal signs. We may say that language is creative, governed
by rules, innate as well as universal at the same time. On the other hand, it is humanly too. The nature of
the language is different for different people. There is a lot of misconception about the nature of the
language. That is why it is very important to understand the meaning of the ambiguous term ‘grammar’.
In linguistics, the term grammar may be defined as the rules or principles with the help of which language
works. In broad sense, we can divide grammar in two categories −

Descriptive Grammar
The set of rules, where linguistics and grammarians formulate the speaker’s grammar is called descriptive
grammar.

Perspective Grammar
It is a very different sense of grammar, which attempts to maintain a standard of correctness in the
language. This category has little to do with the actual working of the language.

Grammatical Categories

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 4


A grammatical category may be defined as a class of units or features within the grammar of a language.
These units are the building blocks of language and share a common set of characteristics. Grammatical
categories are also called grammatical features.

In recent years, AI has evolved rapidly, and with that, NLP got more sophisticated, too. Many of us already
use NLP daily without realizing it. You’ve probably used at least one of the following tools:
• Spell checker.
• Autocomplete.
• Spam filters.
• Voice text messaging.

GENERATIVE CAPACITY
Grammatical formalisms can be classified by their generative capacity: the set of languages they can represent.
Chomsky (1957) describes four classes of grammatical formalisms that differ only in the form of the rewrite rules. The
classes can be arranged in a hierarchy, where each class can be used to describe all the languages that can be described
by a less powerful class, as well as some additional languages. Here we list the hierarchy, most powerful class first:

Recursively enumerable grammars use unrestricted rules: both sides of the rewrite rules can have any number of
terminal and nonterminal symbols, as in the rule A B C → D E. These grammars are equivalent to Turing machines in
their expressive power.

Context-sensitive grammars are restricted only in that the right-hand side must contain at least as many symbols as
the left-hand side. The name “contextsensitive” comes from the fact that a rule such as A X B → A Y B says that
an X can be rewritten as a Y in the context of a preceding A and a following B. Context-sensitive grammars can represent
languages such as anbncn (a sequence of n copies of a followed by the same number of bs and then cs).

In context-free grammars (or CFGs), the left-hand side consists of a single nonterminal symbol. Thus, each rule
licenses rewriting the nonterminal as the right-hand side in any context. CFGs are popular for natural-language and
programming-language grammars, although it is now widely accepted that at least some natural languages have
constructions that are not context-free (Pullum, 1991). Context-free grammars can represent anbn, but not anbncn.

Regular grammars are the most restricted class. Every rule has a single nonterminal on the left-hand side and a terminal
symbol optionally followed by a nonterminal on the right-hand side. Regular grammars are equivalent in power to
finitestate machines. They are poorly suited for programming languages, because they cannot represent constructs such
as balanced opening and closing parentheses (a variation of the anbn language). The closest they can come is representing
a ∗ b ∗, a sequence of any number of as followed by any number of bs.

The grammars higher up in the hierarchy have more expressive power, but the algorithms for dealing with them are
less efficient. Up to the 1980s, linguists focused on context-free and context-sensitive languages.

There have been many competing language models based on the idea of phrase structure; we will describe a popular
model called the probabilistic context-free grammar, or PROBABILISTIC CONTEXT-FREE GRAMMAR PCFG.

1A grammar is a collection of GRAMMAR rules that defines a language as a set of allowable LANGUAGE strings of words.
“Context-free” and “probabilistic” means that the grammar assigns a probability to every string. Here is a PCFG rule:
VP → Verb [0.70]
| VP NP [0.30] .
Here VP (verb phrase) and NP (noun phrase) are non-terminal symbols. The grammar also refers to actual words,
which are called terminal symbols. This rule is saying that with probability 0.70 a verb phrase consists solely of a

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 5


verb, and with probability 0.30 it is a VP followed by an NP. Appendix B describes non-probabilistic context-free
grammars.

We now define a grammar for a tiny fragment of English that is suitable for communication between agents exploring
the wumpus world. We call this language E0. Later sections improve on E0 to make it slightly closer to real English. We
are unlikely ever to devise a complete grammar for English, if only because no two persons would agree entirely on
what constitutes valid English.

Five basic NLP tasks


As we mentioned before, human language is extremely complex and diverse. That’s why natural language
processing includes many techniques to interpret it, ranging from statistical and machine learning methods
to rules-based and algorithmic approaches.

There are five basic NLP tasks that you might recognize from school.
Lexical Analysis − The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream
of characters and converts it into meaningful lexemes. Lexical analysis divides the whole text into
paragraphs, sentences, and words. It involves identifying and analyzing the structure of words. Lexicon of a
language means the collection of words and phrases in a language.

Syntactic Analysis (Parsing) − Syntactic Analysis is used to check grammar, word arrangements, and shows
the relationship among the words.It involves analysis of words in the sentence for grammar and arranging
words in a manner that shows the relationship among the words. The sentence such as “The school goes to
boy” is rejected by English syntactic analyzer.

Example: Agra goes to the Poonam


In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the
Syntactic analyzer.

Semantic Analysis − Semantic analysis is concerned with the meaning representation. It mainly focuses on
the literal meaning of words, phrases, and sentences.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 6


It draws the exact meaning or the dictionary meaning from the text. The text is checked for meaningfulness.
It is done by mapping syntactic structures and objects in the task domain. The semantic analyzer disregards
sentence such as “hot ice-cream”.

Discourse Integration − Discourse Integration depends upon the sentences that proceeds it and alsoinvokes
the meaning of the sentences that follow it.

• The meaning of any sentence depends upon the meaning of the sentence just before it. In addition,
it also brings about the meaning of immediately succeeding sentence.

Pragmatic Analysis − Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect
by applying a set of rules that characterize cooperative dialogues.

For Example: "Open the door" is interpreted as a request instead of an order.


• During this, what was said is re-interpreted on what it actually meant. It involves deriving those
aspects of language which require real world knowledge.

CONCEPT OF GRAMMAR
Grammar is very essential and important to describe the syntactic structure of well-formed programs. In
the literary sense, they denote syntactical rules for conversation in natural languages. Linguistics have
attempted to define grammars since the inception of natural languages like English, Hindi, etc.

The theory of formal languages is also applicable in the fields of Computer Science mainly in programming
languages and data structure. For example, in ‘C’ language, the precise grammar rules state how functions
are made from lists and statements.

A mathematical model of grammar was given by Noam Chomsky in 1956, which is effective for writing
computer languages.

Mathematically, a grammar G can be formally written as a 4-tuple (N, T, S, P) where −

• N or VN = set of non-terminal symbols, i.e., variables.

• T or ∑ = set of terminal symbols.

• S = Start symbol where S ∈ N

• P denotes the Production rules for Terminals as well as Non-terminals. It has the form α → β,
where α and β are strings on VN 𝖴 ∑ and least one symbol of α belongs to VN

Phrase Structure or Constituency Grammar


Phrase structure grammar, introduced by Noam Chomsky, is based on the constituency relation. That is why
it is also called constituency grammar. It is opposite to dependency grammar.

Example
Before giving an example of constituency grammar, we need to know the fundamental points about
constituency grammar and constituency relation.
PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 7
• All the related frameworks view the sentence structure in terms of constituency relation.
• The constituency relation is derived from the subject-predicate division of Latin as well as Greek
grammar.
• The basic clause structure is understood in terms of noun phrase NP and verb phrase VP.
We can write the sentence “This tree is illustrating the constituency relation” as follows −

Dependency Grammar
It is opposite to the constituency grammar and based on dependency relation. It was introduced by Lucien
Tesniere. Dependency grammar (DG) is opposite to the constituency grammar because it lacks phrasal
nodes.

Example

Before giving an example of Dependency grammar, we need to know the fundamental points about
Dependency grammar and Dependency relation.

• In DG, the linguistic units, i.e., words are connected to each other by directed links.
• The verb becomes the center of the clause structure.
• Every other syntactic units are connected to the verb in terms of directed link. These syntactic units
are called dependencies.
We can write the sentence “This tree is illustrating the dependency relation” as follows;

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 8


Parse tree that uses Constituency grammar is called constituency-based parse tree; and the parse trees that
uses dependency grammar is called dependency-based parse tree.

Context Free Grammar


Context free grammar, also called CFG, is a notation for describing languages and a superset of Regular
grammar. It can be seen in the following diagram −

Definition of CFG
CFG consists of finite set of grammar rules with the following four components −

Set of Non-terminals
It is denoted by V. The non-terminals are syntactic variables that denote the sets of strings, which further
help defining the language, generated by the grammar.
Set of Terminals
It is also called tokens and defined by Σ. Strings are formed with the basic symbols of terminals.
Set of Productions
It is denoted by P. The set defines how the terminals and non-terminals can be combined. Every
production(P) consists of non-terminals, an arrow, and terminals (the sequence of terminals). Non-
terminals are called the left side of the production and terminals are called the right side of the production.
Start Symbol
The production begins from the start symbol. It is denoted by symbol S. Non-terminal symbol is always
designated as start symbol.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 9


IMPLEMENTATION ASPECTS OF SYNTACTIC ANALYSIS
Parsing is the process of analyzing a string of words to uncover its phrase structure, according to the rules of a
grammar.
There are a number of algorithms researchers have developed for syntactic analysis, but we consider only
the following simple methods –
• Context-Free Grammar
• Top-Down Parser

Context-Free Grammar
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence −
“The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand
and process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which
describe what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols.
According to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then
the string combined by NP followed by VP is a sentence. The rewrite rules for the sentence are as follows-

S → NP VP

NP → DET N | DET ADJ N

VP → V NP

Lexocon −

DET → a | the

ADJ → beautiful | perching

N → bird | birds | grain | grains


V → peck | pecks | pecking

The parse tree can be created as shown −

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 10


Now consider the above rewrite rules. Since V can be replaced by both, "peck" or "pecks", sentences such
as "The bird peck the grains" can be wrongly permitted. i. e. the subject-verb agreement error is approved
as correct.

Merit − The simplest style of grammar, therefore widely used one.

Demerits −

• They are not highly precise. For example, “The grains peck the bird”, is a syntactically correct
according to parser, but even if it makes no sense, parser takes it as a correct sentence.

• To bring out high precision, multiple sets of grammar need to be prepared. It may require a
completely different sets of rules for parsing singular and plural variations, passive sentences, etc.,
which can lead to creation of huge set of rules that are unmanageable.

Top-Down Parser
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols
that matches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again
with a different set of rules. This is repeated until a specific rule is found which describes the structure of
the sentence.

Merit − It is simple to implement.


Demerits −
• It is inefficient, as the search process has to be repeated if an error occurs.
• Slow speed of working.

Natural Language Processing - Semantic Analysis


The purpose of semantic analysis is to draw exact meaning, or you can say dictionary meaning from the
text. The work of semantic analyzer is to check the text for meaningfulness.

We already know that lexical analysis also deals with the meaning of the words, then how is semantic
analysis different from lexical analysis? Lexical analysis is based on smaller token but on the other side
semantic analysis focuses on larger chunks. That is why semantic analysis can be divided into the following
two parts −

Studying meaning of individual word

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 11


It is the first part of the semantic analysis in which the study of the meaning of individual words is
performed. This part is called lexical semantics.
Studying the combination of individual words
In the second part, the individual words will be combined to provide meaning in sentences.
The most important task of semantic analysis is to get the proper meaning of the sentence. For example,
analyze the sentence “Ram is great.” In this sentence, the speaker is talking either about Lord Ram or about
a person whose name is Ram. That is why the job, to get the proper meaning of the sentence, of semantic
analyzer is important.
Elements of Semantic Analysis
Followings are some important elements of semantic analysis −
Hyponymy
It may be defined as the relationship between a generic term and instances of that generic term. Here the
generic term is called hypernym and its instances are called hyponyms. For example, the word color is
hypernym and the color blue, yellow etc. are hyponyms.
Homonymy
It may be defined as the words having same spelling or same form but having different and unrelated
meaning. For example, the word “Bat” is a homonymy word because bat can be an implement to hit a ball
or bat is a nocturnal flying mammal also.
Polysemy
Polysemy is a Greek word, which means “many signs”. It is a word or phrase with different but related sense.
In other words, we can say that polysemy has the same spelling but different and related meaning. For
example, the word “bank” is a polysemy word having the following meanings −
• A financial institution.
• The building in which such an institution is located.
• A synonym for “to rely on”.
Difference between Polysemy and Homonymy
Both polysemy and homonymy words have the same syntax or spelling. The main difference between them
is that in polysemy, the meanings of the words are related but in homonymy, the meanings of the words
are not related. For example, if we talk about the same word “Bank”, we can write the meaning ‘a financial
institution’ or ‘a river bank’. In that case it would be the example of homonym because the meanings are
unrelated to each other.
Synonymy
It is the relation between two lexical items having different forms but expressing the same or a close
meaning. Examples are ‘author/writer’, ‘fate/destiny’.
Antonymy
It is the relation between two lexical items having symmetry between their semantic components relative
to an axis. The scope of antonymy is as follows −
• Application of property or not − Example is ‘life/death’, ‘certitude/incertitude’
• Application of scalable property − Example is ‘rich/poor’, ‘hot/cold’
• Application of a usage − Example is ‘father/son’, ‘moon/sun’.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 12


Meaning Representation
Semantic analysis creates a representation of the meaning of a sentence. But before getting into the
concept and approaches related to meaning representation, we need to understand the building blocks of
semantic system.

Building Blocks of Semantic System


In word representation or representation of the meaning of the words, the following building blocks play
an important role −
• Entities − It represents the individual such as a particular person, location etc. For example,
Haryana. India, Ram all are entities.
• Concepts − It represents the general category of the individuals such as a person, city, etc.
• Relations − It represents the relationship between entities and concept. For example, Ram is a
person.
• Predicates − It represents the verb structures. For example, semantic roles and case grammar are
the examples of predicates.
Now, we can understand that meaning representation shows how to put together the building blocks of
semantic systems. In other words, it shows how to put together entities, concepts, relation and predicates
to describe a situation. It also enables the reasoning about the semantic world.
Approaches to Meaning Representations
Semantic analysis uses the following approaches for the representation of meaning −
• First order predicate logic (FOPL)
• Semantic Nets
• Frames
• Conceptual dependency (CD)
• Rule-based architecture
• Case Grammar
• Conceptual Graphs
Need of Meaning Representations
A question that arises here is why do we need meaning representation? Followings are the reasons for
the same −
Linking of linguistic elements to non-linguistic elements
The very first reason is that with the help of meaning representation the linking of linguistic elements to
the non-linguistic elements can be done.
Representing variety at lexical level
With the help of meaning representation, unambiguous, canonical forms can be represented at the lexical
level.
Can be used for reasoning
Meaning representation can be used to reason for verifying what is true in the world as well as to infer
the knowledge from the semantic representation.
Lexical Semantics
The first part of semantic analysis, studying the meaning of individual words is called lexical semantics. It
includes words, sub-words, affixes (sub-units), compound words and phrases also. All the words, sub-

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 13


words, etc. are collectively called lexical items. In other words, we can say that lexical semantics is the
relationship between lexical items, meaning of sentences and syntax of sentence.

Following are the steps involved in lexical semantics −


• Classification of lexical items like words, sub-words, affixes, etc. is performed in lexical semantics.
• Decomposition of lexical items like words, sub-words, affixes, etc. is performed in lexical semantics.
• Differences as well as similarities between various lexical semantic structures is also analyzed.

MACHINE TRANSLATION
All translation systems must model the source and target languages, but systems vary in the type of models
they use. Some systems attempt to analyze the source language text all the way into an interlingua
knowledge representation and then generate sentences in the target language from that representation.
This is difficult because it involves three unsolved problems: creating a complete knowledge representation
of everything; parsing into that representation and generating sentences from that representation.

Other systems are based on a transfer model. They keep a database of translation rules (or examples), and
whenever the rule (or example) matches, they translate directly. Transfer can occur at the lexical,
syntactic, or semantic level.

What is a machine translation and how does it work?


Machine Translation or MT or robotized interpretation is simply a procedure when a computer software
translates text from one language to another without human contribution. At its fundamental level,machine
translation performs a straightforward replacement of atomic words in a single characteristic language for
words in another.

Using corpus methods, more complicated translations can be conducted, taking into account better
treatment of contrasts in phonetic typology, express acknowledgement, and translations of idioms, just as
the seclusion of oddities. Currently, some systems are not able to perform just like a human translator, but
in the coming future, it will also be possible.
In simple language, we can say that machine translation works by using computer software to translate
the text from one source language to another target language.
The term ‘machine translation’ (MT) refers to computerized systems responsible for producing translations
with or without human assistance. It excludes computer-based translation tools that support translators by
providing access to online dictionaries, remote terminology databanks, transmission and reception oftexts,
etc.

Before the AI technology era, computer programs for the automatic translation of text from one language
to another were developed. In recent years, AI has been tasked with making the automatic or machine
translation of human languages’ fluidity and versatility of scripts, dialects, and variations. Machine
translation is challenging given the inherent ambiguity and flexibility of human language.

Different types of machine translation in NLP


There are four types of machine translation:

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 14


1. Statistical Machine Translation or SMT
It works by alluding to statistical models that depend on the investigation of huge volumes of bilingual
content. It expects to decide the correspondence between a word from the source language and a
word from the objective language. A genuine illustration of this is Google Translate.

Presently, SMT is extraordinary for basic translation, however its most noteworthy disadvantage is
that it doesn't factor in context, which implies translation can regularly be wrong or you can say, don't
expect great quality translation. There are several types of statistical-based machine translation
models which are: Hierarchical phrase-based translation, Syntax-based translation, Phrase-based
translation, Word-based translation.
2. Rule-based Machine Translation or RBMT
RBMT basically translates the basics of grammatical rules. It directs a grammatical examination of
the source language and the objective language to create the translated sentence. But, RBMT
requires broad editing, and its substantial reliance on dictionaries implies that proficiency is
accomplished after a significant period. (Also read: Top 10 Natural Processing Languages (NLP)
Libraries with Python)

3. Hybrid Machine Translation or HMT


HMT, as the term demonstrates, is a mix of RBMT and SMT. It uses a translation memory, making it
unquestionably more successful regarding quality. Nevertheless, even HMT has a lot of downsides, the
biggest of which is the requirement for enormous editing, and human translators will also be needed.
There are several approaches to HMT like multi-engine, statistical rule generation, multi- pass, and
confidence-based.

4. Neural Machine Translation or NMT


NMT is a type of machine translation that relies upon neural network models (based on the human
brain) to build statistical models with the end goal of translation. The essential advantage of NMT is
that it gives a solitary system that can be prepared to unravel the source and target text. Subsequently,
it doesn't rely upon specific systems that are regular to other machine translation systems, particularly
SMT.

What are the benefits of machine translation?


One of the crucial benefits of machine translation is speed as you have noticed that computer programs
can translate a huge amount of text rapidly. Yes, the human translator does their work more accurately but
they cannot match the speed of the computer.

If you especially train the machine to your requirements, machine translation gives the ideal blend of brisk
and cost-effective translations as it is less expensive than using a human translator. With a specially trained
machine, MT can catch the setting of full sentences before translating them, which gives you high quality
and human-sounding yield. Another benefit of machine translation is its capability to learn important words
and reuse them wherever they might fit.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 15


Applications of machine translation
Machine translation technology and products have been used in numerous application situations, for
example, business travel, the travel industry, etc. In terms of the object of translation, there are composed
language-oriented content text translation and spoken language.

Machine Translation vs Human translation


• Machine translation hits that sweet spot of cost and speed, offering a truly snappy path for brands to
translate their records at scale without much overhead. Yet, that doesn't mean it's consistently relevant.
On the other hand, human translation is incredible for those undertakings that require additional
consideration and subtlety. Talented translators work on your image's substance to catch the first
importance and pass on that feeling or message basically in another assortment of work.

• Leaning upon how much content should be translated, the machine translation can give translated content
very quickly, though human translators will take additional time. Time spent finding, verifying, and dealing
with a group of translators should likewise be considered.

• Numerous translation programming providers can give machine translations at practically zero cost, making
it a reasonable answer for organizations who will be unable to manage the cost of expert translations.

• Machine Translation is the instant modification of text from one language to another utilizing artificial
intelligence whereas a human translation, includes actual brainpower, in the form of one or more translators
translating the text manually.

• Text translation
Automated text translation is broadly used in an assortment of sentence-level and text-level translation
applications. Sentence-level translation applications incorporate the translation of inquiry and recovery
inputs and the translation of (OCR) outcomes of picture optical character acknowledgement. Text-level
translation applications incorporate the translation of a wide range of unadulterated reports, and the
translation of archives with organized data.

Organized data mostly incorporates the presentation configuration of text content, object type activity,
and other data, for example, textual styles, colours, tables, structures, hyperlinks, etc. Presently, the
translation objects of machine translation systems are mostly founded on the sentence level.

Most importantly, a sentence can completely communicate a subject substance, which normally frames an
articulation unit, and the significance of each word in the sentence can be resolved to an enormous degree
as per the restricted setting inside the sentence.

Also, the methods and nature of getting data at the sentence level granularity from the preparation corpus
are more effective than that dependent on other morphological levels, for example, words, expressions, and
text passages. Finally, the translation depends on sentence-level can be normally reached out to help
translation at other morphological levels.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 16


• Other applications
Naturally, the task of machine translation is to change one source language word succession into another
objective language word grouping which is semantically the same. Generally, it finishes a grouping
transformation task, which changes over a succession object into another arrangement object as indicated
by some information and rationale through model and algorithms.

All things considered, many undertaking situations total the change between grouping objects, and the
language in the machine translation task is just one of the succession object types. In this manner, when the
ideas of the source language and target language are stretched out from dialects to other arrangementobject
types, machine translation strategies and techniques can be applied to settle numerous comparable change
undertakings.

Speech translation : Speech recognition is the task of identifying a sequence of words uttered by a speaker,
given the acoustic signal. It has become one of the mainstream applications of AI—millions of people interact
with speech recognition systems every day to navigate voice mail systems, search the Web from mobile
phones, and other applications. Speech is an attractive option when hands-free operation is necessary, as when
operating machinery.

Speech recognition is difficult because the sounds made by a speaker are ambiguous and, well, noisy. As a
well-known example, the phrase “recognize speech” sounds almost the same as “wreck a nice beach” when
spoken quickly.

A speech understanding system must answer three questions:


1. What speech sounds did the speaker utter?
2. What words did the speaker intend to express with those speech sounds?
3. What meaning did the speaker intend to express with those words?

With the fast advancement of mobile applications, voice input has become an advantageous method of
human-computer cooperation, and discourse translation has become a significant application situation. The
fundamental cycle of discourse interpretation is "source language discourse source language text- target
language text-target language discourse".

In this cycle, programmed text translation from source language text to target-language text is an important
moderate module. What's more, the front end and back end likewise need programmed discourse
recognition, ASR and text-to-speech, TTs.

The quality of a speech recognition system depends on the quality of all of its components the language model, the
word-pronunciation models, the phone models, and the signal processing algorithms used to extract spectral features
from the acoustic signal.

SPEECH RECOGNITION
Speech recognition refers to a computer interpreting the words spoken by a person and converting them
to a format that is understandable by a machine. Depending on the end-goal, it is then converted to text or
voice or another required format.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 17


For instance, Apple’s Siri and Google’s Alexa use AI-powered speech recognition to provide voice or text
support whereas voice-to-text applications like Google Dictate transcribe your dictated words to text.Voice
recognition is another form of speech recognition where a source sound is recognized and matched to a
person’s voice.

Speech recognition AI applications have seen significant growth in numbers in recent times as businesses are
increasingly adopting digital assistants and automated support to streamline their services. Voice assistants,
smart home devices, search engines, etc are a few examples where speech recognition has seen prominence.
As per Research and Markets, the global market for speech recognition is estimated to growat a CAGR of
17.2% and reach $26.8 billion by 2025.

Speech recognition is fast overcoming the challenges of poor recording equipment and noise cancellation,
variations in people’s voices, accents, dialects, semantics, contexts, etc using artificial intelligence and
machine learning. This also includes challenges of understanding human disposition, and the varyinghuman
language elements like colloquialisms, acronyms, etc. The technology can provide a 95% accuracy now as
compared to traditional models of speech recognition, which is at par with regular human communication.

Furthermore, it is now an acceptable format of communication given the large companies that endorse it
and regularly employ speech recognition in their operations. It is estimated that a majority of search engines
will adopt voice technology as an integral aspect of their search mechanism.

Speech recognition and AI play an integral role in NLP models in improving the accuracy and efficiency of
human language recognition.
Use Cases of Speech Recognition
Let’s explore the uses of speech recognition applications in different fields:
1. Voice-based speech recognition software is now used to initiate purchases, send emails, transcribe
meetings, doctor appointments, and court proceedings, etc.
2. Virtual assistants or digital assistants and smart home devices use voice recognition software to answer
questions, provide weather news, play music, check traffic, place an order, and so on.
3. Companies like Venmo and PayPal allow customers to make transactions using voice assistants. Several
banks in North America and Canada also provide online banking using voice-based software.
4. Ecommerce is significantly powered by voice-based assistants and allows users to make purchases quickly
and seamlessly.
5. Speech recognition is poised to impact transportation services and streamline scheduling, routing, and
navigating across cities.
6. Podcasts, meetings, and journalist interviews can be transcribed using voice recognition. It is also used to
provide accurate subtitles to a video.
7. There has been a huge impact on security through voice biometry where the technology analyses the
varying frequencies, tone and pitch of an individual’s voice to create a voice profile. An example of this is
Switzerland’s telecom company Swisscom which has enabled voice authentication technology in its call
centres to prevent security breaches.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 18


8. Customer care services are being traced by AI-based voice assistants, and chatbots to automate repeatable
tasks.
As speech recognition and AI impact both professional and personal lives at workplaces and homes
respectively, the demand for skilled AI engineers and developers, Data Scientists, and Machine Learning
Engineers, is expected to be at an all-time high.
There will be a requirement for skilled AI professionals to enhance the relationship between humans and
digital devices. As job opportunities are created, they will result in increased perks and benefits for those in
this field.

CHAPTER-2:

WHAT IS PERCEPTION IN AI?


Perception is a process to interpret, acquire, select and then organize the sensory information that is
captured from the real world. Perception provides agents with information about the world they inhabit.
Perception is initiated by sensors. A sensor is anything that can change the computational state of the agent
in response to a change in the state of the world. It could be as simple as a one-bit sensor that detects
whether a switch is on or off, or as complex as the retina of the human eye, which contains more than a
hundred million photosensitive elements.

For example: Human beings have sensory receptors such as touch, taste, smell, sight and hearing. So, the
information received from these receptors is transmitted to human brain to organize the received
information.

Perception in Artificial Intelligence is the process of interpreting vision, sounds, smell, and touch. Perception
helps to build machines or robots that react like humans. Perception is a process to interpret, acquire, select,
and then organize the sensory information from the physical world to make actions like humans. The main
difference between AI and robot is that the robot makes actions in the real world.
According to the received information, action is taken by interacting with the environment to manipulate
and navigate the objects.

Perception provides agents with information about the world they inhabit by interpreting the response of
sensors. A sensor measures some aspect of the environment in a form that can be used as input by an agent
program. The sensor could be as simple as a switch, which gives one bit telling whether it is on or off, or as
complex as the eye. A variety of sensory modalities are available to artificial agents.

Perception and action are very important concepts in the field of Robotics. The following figures show the
complete autonomous robot.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 19


There is one important difference between the artificial intelligence program and robot. The AI program
performs in a computer stimulated environment, while the robot performs in the physical world.
For example:
In chess, an AI program can be able to make a move by searching different nodes and has no facility to
touch or sense the physical world.
However, the chess playing robot can make a move and grasp the pieces by interacting with the physical
world.
Image formation in digital camera
Image formation is a physical process that captures object in the scene through lens and creates a 2-D
image.
Let's understand the geometry of a pinhole camera shown in the following diagram.

In the above figure, an optical axis is perpendicular to the image plane and image plane
is generally placed in front of the optical center.
So, let P be the point in the scene with coordinates (X,Y,Z) and P' be its image plane
with coordinates (x, y, z).

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 20


If the focal length from the optical center is f, then by using properties of similar
triangles, equation is derived as,

-x/f = X/Z so x = - fX/Z ..........................equation (i)


-y/f = -Y/Z so y = - fY/Z .........................equation (ii)

These equations define an image formation process called as perspective projection.

EARLY IMAGE PROCESSING OPERATIONS


From the inception of AI, image processing to be incorporated in smart systems is a perennial project for people
working on it. In its initial phase, it required a lot of manual input, providing instructions to computers, to get
some output. These machines,or known as Expert Systems, were trained to recognize images.

There are 2 methods of image processing:


• – Analog image processing, which is used for processing photographs, printouts, and other image hard
copies.
• – Digital image processing, which is used for manipulating digital images with the help of complex
algorithms

Main Purpose of Image Processing


• – Representing processed data in a visual way one can understand, for instance, giving a visual form to
invisible objects.
• – To improve the processed image quality, image sharpening and restoration works well.
• – Image convalescence helps in searching images.
• – Helps to measure objects in the image.
• – With pattern recognition, it becomes easy to classify objects in the image, locate their position and get
an overall understanding of the scene.

Image Processing Phases


There are 8 phases for image processing which goes step-wise:
• Image acquisition:
Captures the image with a sensor and converts it into a manageable entity
• Image enhancement
The input image quality is improved and also extracts details hidden in it
• Image restoration
Any possible corruption like blur, noise, or camera misfocus is removed to get a cleaner vision on
probabilistic and mathematical model basis
• Color image processing
The colored images and varied color spaces are processed with pseudocolor or RGB processing way.
• Image compression and decompression

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 21


This allows for changes in image resolution and size, be it for reduction or restoring images depending on
the need.
• Morphological processing
Defines the object structure and shape in the image.
• Image recognition
For a particular object, the specific features are identified in the image and techniques like object
detection are used for the same.
• Representation and description
is all about visualizing the processed data.

IMAGE PROCESSING METHODS, TECHNIQUES, AND TOOLS


The images captured with regular sensors need preprocessing as some could contain too much noise or are
misfocused. There are two detection techniques to be used for processing digital images as well as for
preprocessing.
• Filtering
Used to modify and enhance the input image. With various filters available, certain features in the image
can be emphasized or removed, can also reduce the image noise and so on.
• Edge detection
Used for data extraction and image segmentation, to find meaningful object edges in the images that are
preprocessed.
To make things easier, there are specific libraries and frameworks that can be used to implement image
processing functionalities.

Object Recognition from Structural information,


Conventionally the recognition of objects in a scene is achieved by digital image processing [ 11], whose
purpose is directly related to artificial vision or computer vision. Artificial vision aims to detect,
segment, locate, and recognize particular objects. Thus, the following methodology is proposed: (a)
segmentation, (b) intelligent recognition, and, (c) extraction of characteristics.

A. Segmentation

The proposed image processing consists of converting the image to grayscale, obtaining a lighter
image format. To obtain a lighter image format is needed to convert on grayscale images. Then, edges
of the images are detected by the derivative method; before the image is enlarged and eroded to close
the found edges. Finally, the borders are filled, achieving a mask that identifies the position of the
object inside the image. Each step is described in detail below:

1) Greyscale Transformation. It consists of determining the equivalent of the luminance, which is


defined as the light received on a surface being defined as the relation of the luminous flux on the
illuminated area [12], luminance concept is associated with human eye perception of different light
intensity [13]. The luminance is calculated based on the weighted average of the color components of
each pixel, as shown in Equation 1, where L corresponds to the luminance, R is the red component. G
the green component and B the blue component.

(1)

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 22


Equation 1 should be used for calculation grayscale to each pixel; this process is required for the image
edge detection.

2) Images Edges Detection. The image obtained in the previous step can be represented as a
discrete function in 2D, which is defined by the coordinates of each pixel m and n. The discrete value of
the function is evaluated at a specific point; its procedure is known as brightness or pixel intensity. An
edge is defined as tone changes between pixels, in cases where changes exceed a threshold value, it is
considered an edge. Different methods to identify edges have been proposed, one of them is the
intensity gradient of each pixel, using a convolution mask, then magnitude is calculated finally, the
threshold process is applied [14].

The most used edge detection techniques employ local operators [ 15], using discrete approximations of
the first and second of grayscales images, hereunder it will be described the proposed operator, which
is based on the first derivative of the image.

What is the purpose of edge detection/image detection?


Edges are straight lines or curves in the image plane across which there is a “significant” change in image brightness.
The goal of edge detection is to abstract away from the messy, multi megabyte image and toward a more compact,
abstract representation
• Edge detection operation is used in an image processing.
• The main goal of edge detection is to construct the ideal outline of an image.
• Discontinuity in brightness of image is affected due to:
i) Depth discontinuities
ii) Surface orientation discontinuities
iii) Reflectance discontinuities
iv) Illumination.

B. Artificial Intelligence for Object Recognition


• The image is reconstructed using a Hopfield neural network; the network eliminates noise in edge
image, improving accuracy and precision, and obtaining a proper segmented image.
• Artificial intelligence is known as "the science and engineering of making intelligent machines,
especially intelligent computer programs", this definition was proposed by Professor John
Mccarthy, in 1956 [19]. The target of artificial intelligence is to think, evaluate or act according
to certain inputs to exercise some specific function, to achieve this, different processes could be
performed: i.) Genetic algorithms, ii.) Artificial neural networks and iii.) Formal logic.
• For the specific problem, artificial neural networks are employed, it uses elements information
processors, where local interactions depend on overall system performance.[20] Networks consist
of a large number of simple processing elements called nodes or neurons that are arranged in
layers [21]. Each neuron is connected to other neurons by communication links, which have a
weight associated. Weights represent the information that will be used by the neural network to
solve a given problem [22]. Now, there are various types of neural networks, such as self-
organization, recurrent, among others.

3D-Information extraction using vision


Why extraction of 3-D information is necessary?The
3-D information extraction process plays an important role to perform the tasks like
manipulation, navigation and recognition. It deals with the following aspects:

1. Segmentation of the scene


The segmentation is used to arrange the array of image pixels into regions. This helps to
match semantically meaningful entities in the scene.

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 23


• The goal of segmentation is to divide an image into regions which are homogeneous.
• The union of the neighboring regions should not be homogeneous.
• Thresholding is the simplest technique of segmentation. It is simply performed on the
object, which has an homogeneous intensity and a background with a different intensity
level and the pixels are partitioned depending on their intensity values.
2. To determine the position and orientation of each objectDetermination
of the position and orientation of each object relative to the observer is important for
manipulation and navigation tasks. For example: Suppose a
person goes to a store to buy something. While moving around he must know the locations
and obstacles, so that he can make the plan and path toavoid them.
• The whole orientation of image should be specified in terms of a three dimensional
rotation.
3. To determine the shape of each and every object When
the camera moves around an object, the distance and orientation of that object willchange
but it is important to preserve the shape of that object.For example:
If an object is cube, that fact does not change, but it is difficult to represent the global
shape to deal with wide variety of objects present in the real world.
• If the shape of an object is same for some manipulating tasks, it becomes easy to decide
how to grasp that object from a particular place.
• The object recognition plays most significant role to identify and classify the objects as
an example only when the geometric shapes are provided with color and texture.
However, a question arises that, how should we recover 3-D image from the
pinhole camera?

There are number of techniques available in the visual stimulus for 3D-image extraction
such as motion, binocular stereopsis, texture, shading, and contour. Each of these
techniques operates on the background assumptions about physical scene to provide
interpretation.

EXTRACTING 3-D INFORMATION USING VISION


We need to extract 3-D information for performing certain tasks such as manipulation, navigation,
and recognition. There are threte aspects of this:
1. Segmentation of the scene into distinct objects.
2. Determining the position and orientation of each object relative to the observer.
3. Determining the shape of each object.
Segmentation of the image is a key step towards organizing the array of image pixels into regions that would
correspond to semantically meaningful entities in the scene. For recognition, we would like to know what
features belong together so that one could compare them with stored models; to grasp an object, one needs to
know what belongs together as an object.
Segmentation is best viewed as part of extraction of 3-D information about the scene.

One of the principal uses of vision is to provide information for manipulating objects—picking them up,
grasping, twirling, and so on—as well as navigating in a scene while avoiding obstacles. The capability to use
vision for these purposes is present in the most primitive of animal visual systems. Perhaps the evolutionary
origin of the vision sense can be traced back to the presence of a photosensitive spot on one end of an
organism that enabled it to orient itself toward (or away from) the light. Flies use vision based on optic flow
to control their landing responses. Mobile robots moving around in an environment need to know where the
obstacles are, where free space corridors are available, and so on.

******

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL


Page 24
a d
e m
H
ri

PREPARED BY: O.SAMPATH ASSISTANT PROFESSOR –SVREC:: NANDYAL Page 25


ARTIFICIAL INTELLIGENCE

UNIT-V
SYLLABUS:

Robotics: Introduction, Robot Hardware, Robotic Perception, Planning to move,


planning uncertain movements, Moving, Robotic software architectures, application
domains.

Philosophical foundations: Weak AI, Strong AI, Ethics and Risks of AI, Agent
Components, Agent Architectures, Are we going in the right direction, What if AI
does succeed.
AI Unit-5.3: ROBOT:
Robots are physical agents that perform tasks by manipulating the physical
world.
Effectors have a single purpose that to assert physical forces on the environment.
Robots are also equipped with sensors, which allow them to perceive their
environment.

Most of today’s robots fall into one of three primary categories.


1. MANIPULATORS:
Manipulator motion usually involves a chain of controllable joints, enabling such
robots to place their effectors in any position within the workplace. Few car
manufacturers could survive without robotic manipulators, and some
manipulators have even been used to generate original artwork.

2. MOBBILE ROBOT:
The second category is the mobile robot. Mobile robots move about their
environment using wheels, legs, or similar mechanisms. They have been put to
use delivering food in hospitals, moving containers at loading docks, and similar
tasks. Other types of mobile robots include unmanned air vehicles,
Autonomous underwater vehicles etc..,

3. MOBILE MANIPULATOR:
The third type of robot combines mobility with manipulation, and is often called
a mobile manipulator. Humanoid robots mimic the human torso.

The field of robotics also includes prosthetic devices , intelligent


environments and multibody systems, wherein robotic action is achieved
through swarms of small cooperating robots. Robotics brings together many of
the concepts we have seen earlier in the book, including probabilistic state
estimation, perception, planning, unsupervised learning, and reinforcement
learning.

ROBOT HARDWARE:
The robot hardware mainly depends on 1.sensors and 2.effectors
1.sensors:
Sensors are the perceptual interface between robot and environment.
PASSIVE SENSOR: Passive sensors, such as cameras, are true observers of the
environment: they capture signals that are generated by other sources in the
environment.
ACTIVE SENSOR: Active sensors, such as sonar, send energy into the
environment. They rely on the fact that this energy is reflected back to the sensor.

Range finders are sensors that measure the distance to nearby objects. In
the early days of robotics, robots were commonly equipped with sonar sensors.
Sonar sensors emit directional sound waves, which are reflected by objects, with
some of the sound making it back into the sensor.
Stereo vision relies on multiple cameras to image the environment from
slightly different viewpoints, analyzing the resulting parallax in these images to
compute the range of surrounding objects.

Other common range sensors include radar, which is often the sensor of
choice for UAVs. Radar sensors can measure distances of multiple kilometers.
On the other extreme end of range sensing are tactile sensors such as whiskers,
bump panels, and touch-sensitive skin.

A second important class of sensors is location sensors. Most locationsensors


use range sensing as a primary component to determine location. Outdoors, the
Global Positioning System (GPS) is the most common solution to the
localization problem. GPS measures the distance to satellites that emit pulsed
signals.
The third important class is proprioceptive sensors, which inform the
robot of its own motion. To measure the exact configuration of a robotic joint,
motors are often equipped with shaft decoders that count the revolution of
motors in small increments.

Other important aspects of robot state are measured by force sensors and
torque sensors. These are indispensable when robots handle fragile objects or
objects whose exact shape and location is unknown.

EFFECTORS:
Effectors are the means by which robots move and change the shape of their
bodies. To understand the design of effectors we use the concept of degree of
freedom.

We count one degree of freedom for each independent direction in which a robot,
or one of its effectors, can move. For example, a rigid mobile robot such as an
AUV has six degrees of freedom, three for its (x, y, z) location in space and three
for its angular orientation, known as yaw, roll, and pitch. These six degrees define
the kinematic state2 or pose of the robot. The dynamic state of a robot includes
these six plus an additional six dimensions for the rate ofchange of each
kinematic dimension, that is, their velocities.

For nonrigid bodies, there are additional degrees of freedom within the
robot itself. For example, the elbow of a human arm possesses two degree of
freedom. It can flex the upper arm towards or away, and can rotate right or left.
The wrist has three degrees of freedom. It can move up and down, side to side,
and can also rotate. Robot joints also have one, two, or three degrees of freedom
each. Six degrees of freedom are required to place an object, such as a hand, at a
particular point in a particular orientation.
In the fig 4(a) has exactly six degrees of freedom, created REVOLUTE JOINT by
five revolute joints that generate rotational motion and one prismatic joint that
generates sliding motion

For mobile robots, the DOFs are not necessarily the same as the number
of actuated elements.
Consider, for example, your average car: it can move forward or backward, and
it can turn, giving it two DOFs. In contrast, a car’s kinematic configuration is
three-dimensional: on an open flat surface, one can easily maneuver a car to any
(x, y) point, in any orientation. (See Figure 25.4(b).) Thus, the car has three
effective degrees of freedom but two control label degrees of freedom. We say
a robot is nonholonomic if it has more effective DOFs than controllable DOFs
and holonomic if the two numbers are the same.

Sensors and effectors alone do not make a robot. A complete robot also needs a
source of power to drive its effectors. The electric motor is the most popular
mechanism for both manipulator actuation and locomotion, but pneumatic
actuation using compressed gas and Hydraulic actuation using pressurized
fluids also have their application niches.

ROBOTIC PERCEPTION:
Perception is the process by which robots map sensor measurements into
internal representations of the environment. Perception is difficult because
sensors are noisy, and the environment is partially observable, unpredictable, and
often dynamic.

As a rule of thumb, good internal representations for robots have three


properties: they contain enough information for the robot to make good decisions,
they are structured so that they can be updated efficiently, and theyare natural
in the sense that internal variables correspond to natural state variables in the
physical world.

For robotics problems, we include the robot’s own past actions as observed
variables in the model. Figure 25.7 shows the notation used in this
chapter: Xt is the state of the environment (including the robot) at time t, Zt is
the observation received at time t, and At is the action taken after the observation
is received.

We would like to compute the new belief state, P(Xt+1 | z1:t+1, a1:t), from the
current belief state P(Xt | z1:t, a1:t−1) and the new observation zt+1.
Thus, we modify the recursive filtering equation (15.5 on page 572) to use
integration rather than summation:
P(Xt+1 | z1:t+1, a1:t)
= αP(zt+1 | Xt+1)_ P(Xt+1 | xt, at) P(xt | z1:t, a1:t−1) dxt . (25.1)
This equation states that the posterior over the state variables X at time t + 1 is
calculated recursively from the corresponding estimate one time step earlier. This
calculation involves the previous action at and the current sensor measurement
zt+1. The probability P(Xt+1 | xt, at) is called the transition model or motion
model, and P(zt+1 | X t+1) is the sensor model.

1. Localization and mapping


Localization is the problem of finding out where things are—including the robot
itself.

Knowledge about where things are is at the core of any successful physical
interaction with the environment.
To keep things simple, let us consider a mobile robot that moves slowly in a flat
2D world. Let us also assume the robot is given an exact map of the environment.
The pose of such a mobile robot is defined by its two Cartesian coordinates with
values x and y and its heading with value θ, as illustrated in Figure 25.8(a). If we
arrange those three values in a vector, then any particular state is given by Xt =
θ .
In the kinematic approximation, each action consists of the ―instantaneous‖
specification of two velocities—a translational velocity vt and a rotational
velocity ωt. For small time intervals Δt, a crude deterministic model of the
motion of such robots is given by

The notation ˆX refers to a deterministic state prediction. Of course, physical


robots are somewhat unpredictable. This is commonly modeled by a Gaussian
distribution with mean f(Xt, vt, ωt) and covariance Σx. (See Appendix A for a
mathematical definition.)

P(Xt+1 | Xt, vt, ωt) = N(ˆXt+1,Σx) .


Next, we need a sensor model. We will consider two kinds of
sensor model. The first assumes that the sensors detect stable, recognizable
features of the environment called landmarks. The exact prediction of the
observed range and bearing would be

Again, noise distorts our measurements. To keep things simple, one might assume
Gaussian noise with covariance Σz, giving us the sensor model
P(zt | xt) = N(ˆzt,Σz) .

This problem is important for many robot applications, and it has been studied
extensively under the name simultaneous localization and mapping,
abbreviated as SLAM.
SLAM problems are solved using many different probabilistic
techniques, including the extended Kalman filter

Expectation–maximization is also used for SLAM.

2. Other types of perception


Not all of robot perception is about localization or mapping. Robots also perceive
the temperature, odors, acoustic signals, and so on. Many of these quantities can
be estimated using variants of dynamic Bayes networks.
It is also possible to program a robot as a reactive agent, without explicitly
reasoning about probability distributions over states.

3. Machine learning in robot perception


Machine learning plays an important role in robot perception. This is particularly
the case when the best internal representation is not known. One common
approach is to map high dimensional sensor streams into lower- dimensional
spaces using unsupervised machine learning method. Such an approach is called
low-dimensional embedding.

Methods that make robots collect their own training data are called Self
Supervised.

In this instance, the robot uses machine learning to leverage a short-rangesensor


that works well for terrain classification into a sensor that can see much farther.
That allows the robot to drive faster, slowing down only when thesensor
model says there is a change in the terrain that needs to be examined more
carefully by the short-range sensors.

PLANNING TO MOVE:

All of a robot’s deliberations ultimately come down to deciding how to move


effectors. The point-to-point motion problem is to deliver the robot or its end
effector to a designated target location. A greater challenge is the compliant
motion problem, in which a robot moves while being in physical contact with
an obstacle.

There are two main approaches: cell decomposition and skeletonization. Each
reduces the continuous path-planning problem to a discrete graph-search
problem.

1 Configuration space
We will start with a simple representation for a simple robot motion problem. It
has two joints that move independently. the robot’s configuration can be
described by a four dimensional coordinate: (xe, ye) for the location of the elbow
relative to the environment and (xg, yg) for the location of the gripper. They
constitute what is known as workspace representation.

The problem with the workspace representation is that not all


workspace coordinates are actually attainable, even in the absence of obstacles.
This is because of the linkage constraints on the space of attainable workspace
coordinates.
Transforming configuration space coordinates into workspace coordinates is
simple: it involves a series of straightforward coordinate transformations. These
transformations are linear for prismatic joints and trigonometric for revolute
joints. This chain of coordinate transformation is known as kinematics.

The inverse problem of calculating the configuration of a robot whose effector


location is specified in workspace coordinates is known as inverse kinematics.

2 Cell decomposition methods


The first approach to path planning uses cell decomposition—that is, it
decomposes the free space into a finite number of contiguous regions, called cells.

A decomposition has the advantage that it is extremely simple to implement,


but it also suffers from three limitations. First, it is workable only for low-
dimensional configuration spaces, Second, there is the problem of what to do with
cells that are ―mixed‖, And third, any path through a discretized state space will
not be smooth.

Cell decomposition methods can be improved in a number of ways,


to alleviate some of these problems. The first approach allows further subdivision
of the mixed cells—perhaps using cells of half the original size. A second way to
obtain a complete algorithm is to insist on an exact cell decomposition of the
free space.

3 Modified cost functions:


This problem can be solved by introducing a potential field. A potential
field is a function defined over state space, whose value grows with the distance
to the closest obstacle. The potential field can be used as an additional cost term
in the shortest-path calculation. This induces an interesting trade off. On the one
hand, the robot seeks to minimize path length to the goal. On the other hand, it
tries to stay away from obstacles by virtue of minimizing the potential function.
Clearly, the resulting path is longer, but it is also safer.
There exist many other ways to modify the cost function. However, it is
often easy to smooth the resulting trajectory after planning, using conjugate
gradient methods. Such post-planning smoothing is essential in many real world
applications.

4 Skeletonization methods
The second major family of path-planning algorithms is based on the idea of
skeletonization.
These algorithms reduce the robot’s free space to a one-dimensional
representation, for which the planning problem is easier. This lower- dimensional
representation is called a skeleton of the configuration space.

Voronoi graph of the free space—the set of all points that are equidistant
to two or more obstacles. To do path planning with a Voronoi graph, the robot
first changes its present configuration to a point on the Voronoi graph. It is easy
to show that this can always be achieved by a straight-line motion in configuration
space. Second, the robot follows the Voronoi graph until it reaches the point
nearest to the target configuration. Finally, the robot leaves theVoronoi graph and
moves to the target. Again, this final step involves straight- line motion in
configuration space.

An alternative to the Voronoi graphs is the probabilistic roadmap, a


skeletonization approach that offers more possible routes, and thus deals better
with wide-open spaces. With these improvements, probabilistic roadmap
planning tends to scale better to high-dimensional configuration spaces than most
alternative path-planning techniques.

ROBOTIC SOFTWARE ARCHITECTURE:


A methodology for structuring algorithms is called a software architecture. An
architecture includes languages and tools for writing programs, as well as an
overall philosophy for how programs can be brought together. Architectures that
combine reactive and deliberate techniques are called hybrid architectures.

1 Subsumption architecture
The subsumption architecture is a framework for assembling reactive
controllers out of finite state machines. Nodes in these machines may contain tests
for certain sensor variables, in which case the execution trace of a finite state
machine is conditioned on the outcome of such a test. The resultingmachines are
refereed to as augmented finite state machines, or AFSMs, where the
augmentation refers to the use of clocks.

An example of a simple AFSM is the four-state machine shown in BELOW


Figure, which generates cyclic leg motion for a hexapod walker.
In our example, we might begin with AFSMs for individual legs, followed
by an AFSM for coordinating multiple legs. On top of this, we might implement
higher-level behaviors such as collision avoidance, which might involve backing
up and turning.

Unfortunately, the subsumption architecture has its own problems. First,


the AFSMs are driven by raw sensor input, an arrangement that works if the
sensor data is reliable and contains all necessary information for decision making,
but fails if sensor data has to be integrated in nontrivial ways over time. A
subsumption style robot usually does just one task, and it has no notion ofhow
to modify its controls to accommodate different goals. Finally, subsumption style
controllers tend to be difficult to understand.

However, it has had an influence on other architectures, and on individual


components of some architectures.

2 Three-layer architecture
Hybrid architectures combine reaction with deliberation. The most popular
hybrid architecture is the three-layer architecture, which consists of a reactive
layer, an executive layer, and a deliberative layer.

The reactive layer provides low-level control to the robot. It is characterized


by a tight sensor–action loop. Its decision cycle is often on the order of
milliseconds.
The executive layer (or sequencing layer) serves as the glue between the reactive
layer and the deliberative layer. It accepts directives by the deliberative layer, and
sequences them for the reactive layer.

The deliberative layer generates global solutions to complex tasks using


planning.
Because of the computational complexity involved in generating such solutions,
its decision cycle is often in the order of minutes. The deliberative layer (or
planning layer) uses models for decision making.
3 Pipeline architecture
Another architecture for robots is known as the pipeline architecture. Just like
the subsumption architecture, the pipeline architecture executes multiple process
in parallel.

Data enters this pipeline at the sensor interface layer. The perception
layer then updates the robot’s internal models of the environment based on this
data. Next, these models are handed to the planning and control layer. Those
are then communicated back to the vehicle through the vehicle interface layer.

The key to the pipeline architecture is that this all happens in parallel. While
the perception layer processes the most recent sensor data, the control layer bases
its choices on slightly older data. In this way, the pipeline architecture is similar
to the human brain. We don’t switch off our motion controllers when we digest
new sensor data. Instead, we perceive, plan, and act all at the same time. Processes
in the pipeline architecture run asynchronously, and all computation is data-
driven. The resulting system is robust, and it is fast.

APPLICTION DOMAINS:

Industry and Agriculture. Traditionally, robots have been fielded in areas that
require difficult human labour, yet are structured enough to be amenable to
robotic automation. The best example is the assembly line, where manipulators
routinely perform tasks such as assembly, part placement, material handling,
welding, and painting. In many of these tasks, robots have become more cost-
effective than human workers.
Transportation. Robotic transportation has many facets: from autonomous
helicopters that deliver payloads to hard-to-reach locations, to automatic
wheelchairs that transport people who are unable to control wheelchairs by
themselves, to autonomous straddle carriers that outperform skilled human
drivers when transporting containers from ships to trucks on loading docks.

Robotic cars. Most of use cars every day. Many of us make cell phone calls while
driving. Some of us even text. The sad result: more than a million people die
every year in traffic accidents. Robotic cars like BOSS and STANLEY offer
hope: Not only will they make driving much safer, but they will also free us from
the need to pay attention to the road during our daily commute.

Health care. Robots are increasingly used to assist surgeons with instrument
placement when operating on organs as intricate as brains, eyes, and hearts.
Robots have become indispensable tools in a range of surgical procedures, such
as hip replacements, thanks to their high precision. In pilot studies, robotic
devices have been found to reduce the danger of lesions when performing
colonoscopy.

Hazardous environments. Robots have assisted people in cleaning up nuclear


waste, most notably in Chernobyl and Three Mile Island. Robots were present
after the collapse of the World Trade Center, where they entered structures
deemed too dangerous for human search and rescue crews.

Exploration. Robots have gone where no one has gone before, including the
surface of Mars. Robotic arms assist astronauts in deploying and retrieving
satellites and in building the International Space Station. Robots also helpexplore
under the sea. They are routinely used to acquire maps of sunken ships.

Personal Services. Service is an up-and-coming application domain of robotics.


Service robots assist individuals in performing daily tasks. Commercially
available domestic service robots include autonomous vacuum cleaners, lawn
mowers, and golf caddies. example for robot vaccum cleaner is ROOMBA.

Entertainment. Robots have begun to conquer the entertainment and toy


industry. we see robotic soccer, a competitive game very much like human
soccer, but played with autonomous mobile robots. Robot soccer provides great
opportunities for research in AI, since it raises a range of problems relevant to
many other, more serious robot applications.

Human augmentation. A final application domain of robotic technology is that


of human augmentation. Researchers have developed legged walking machines
that can carry people around, very much like a wheelchair. Several research
efforts presently focus on the development of devices that make it easier for
people to walk or move their arms by providing additional forces through extra
skeletal attachments.
measurements to state variables.
It is also possible to program a robot as a reactive agent, without explicitly reasoning about probability
distributions over states

Machine learning in robot perception


Machine learning plays an important role in robot perception. This is particularly the case when the best
internal representation is not known. One common approach is to map high- dimensional sensor streams
into lower-dimensional spaces using unsupervised machine learn-ing methods. Such an approach is called
low-dimensional embedding. Machine learning makes it possible to learn sensor and motion models from
data, while simultaneously discovering a suitable internal representations.
Another machine learning technique enables robots to continuously adapt to broad changes in sensor
measurements. Picture yourself walking from a sun-lit space into a dark neon-lit room. Clearly things are
darker inside. But the change of light source also affects allthe colors: Neon light has a strongercomponent
of green light than sunlight. Yet somehow we seem not to notice the change. If we walk together with
people into a neon-lit room, we don’t think that suddenly their faces turned green. Our perception quickly
adapts to the newlighting conditions, and our brain ignores the differences.
Methods that make robots collect their own training data (with labels!) are called self- supervised. In
this instance, the robot uses machine learning to leverage a short-range sensorthat works well for terrain
classification into a sensor that can see much farther. That allowsthe robot to drive faster, slowing down
only when the sensor model says there is a change inthe terrain that needs to be examined more carefully
by the short-range sensors.

WEAK AI: CAN MACHINES ACT INTELLIGENTLY?

AI is impossible depends on how it is defined. we defined AI as the quest for the best agent
program on a given architecture. With this formulation, AI is by definition possible: for any digital
architecture with k bits of program storage there are exactly 2k agent programs, and all we have to do
to find the best one is enumerate and test them all. This might not be feasible for large k, but
philosophers deal with the theoretical, not the practical.

Our definition of AI works well for the engineering problem of finding a good agent, given an

152
architecture. Therefore, we’re tempted to end this section right now, answering the title question in the
affirmative. But philosophers are interested in the problem of compar- ing two architectures—human and
machine. Furthermore, they have traditionally posed the question not in terms of maximizing expected
utility but rather as, “Can machines think?”

Alan Turing, in his famous paper “Computing Machinery and Intelligence” (1950), sug- gested
that instead of asking whether machines can think, we should ask whether machines can pass a
behavioral intelligence test, which has come to be called the Turing Test. The testis for a program to
have a conversation (via online typed messages) with an interrogator for five minutes. The interrogator
then has to guess if the conversation is with a program or aperson; the program passes the test if it
fools the interrogator 30% of the time.
The argument from disability
The “argument from disability” makes the claim that “a machine can never do X.” As exam- ples of X,
Turing lists the following:
Be kind, resourceful, beautiful, friendly, have initiative, have a sense of humor, tell right from wrong,
make mistakes, fall in love, enjoy strawberries and cream, make someone fall in love with it, learn from
experience, use words properly, be the subject of its own thought, have as much diversity of behavior as
man, do something really new

It is clear that computers can do many things as well as or better than humans, including things
that people believe require great human insight and understanding. This does not mean, of course, that
computers use insight and understanding in performing these tasks those are not part of behavior, and
we address such questions elsewhere but the point is that one’s first guess about the mental processes
required to produce a given behavior is often wrong. It is also true, of course, that there are many tasks
at which computers do not yet excel (to putit mildly), including Turing’s task of carrying on an open-
ended conversation.

The mathematical objection


It is well known, through the work of Turing (1936) and Go¨del (1931), that certain math- ematical
questions are in principle unanswerable by particular formal systems. Go¨del’s in- completeness
theorem is the most famous example of this. Briefly, for any formal axiomatic system F powerful
enough to do arithmetic, it is possible to construct a so-called Go¨del sentence G(F ) with the following
properties:
• G(F ) is a sentence of F , but cannot be proved within F .
• If F is consistent, then G(F ) is true.
even if we grant that computers have limitations on what they can prove, there is no evidence
that humans are immune from those limitations. It is all too easy to show rigorously that a formal system
cannot do X, and then claim that humans can do X using their own informal method, without giving any
evidence for this claim. Indeed, it is impossible to prove that humans are not subject to Go¨del’s
incompleteness theorem, because any rigorous proof would require a formalization of the claimed
unformalizable human talent, and hence refute itself. So we are left with an appeal to intuition that
humans can somehow perform superhuman feats of mathematical insight. This appeal is expressed with
arguments such as “we must assume our own consistency, if thought is to be possible at all”. Butif
anything, humans are known to be inconsistent. This is certainly true for everyday reasoning, but it is

153
also true for careful mathematical thought. A famous example is the four-color map problem.
The argument from informality
One of the most influential and persistent criticisms of AI as an enterprise was raised by Turing as the
“argument from informality of behavior.” Essentially, this is the claim that human behavior is far too
complex to be captured by any simple set of rules and that because computers can do no more than follow
a set of rules, they cannot generate behavior as intelligent as that of humans. The inability to capture
everything in a set of logical rules is called the qualification problem in AI.
1. Good generalization from examples cannot be achieved without background knowledge. They
claim no one has any idea how to incorporate background knowledge into the neural network
learning process. In fact, that there are techniques for using prior knowledge in learning
algorithms. Those techniques, however, rely on the availability of knowledge in explicit form,
something that Dreyfusand Dreyfus strenuously deny. In our view, this is a good reason for a
serious redesign of current models of neural processing so that they can take advantage of
previously learned knowledge in the way that other learning algorithms do.
2. Neural network learning is a form of supervised learning, requiring the prior identification of
relevant inputs and correct outputs. Therefore, they claim, it cannot operate autonomously
without the help of a human trainer. In fact, learning without a teacher can be accomplished by
unsupervised learning and reinforcement learning .
3. Learning algorithms do not perform well with many features, and if we pick a subset of features,
“there is no known way of adding new features should the current set proveinadequate to account
for the learned facts.” In fact, new methods such as support vector machines handle large feature
sets very well. With the introduction of large Web-based data sets, many applications in areas
such as language processing (Sha and Pereira, 2003) and computer vision (Viola and Jones, 2002a)
routinely handle millionsof features.
4. The brain is able to direct its sensors to seek relevant information and to process itto extract aspects
relevant to the current situation. But, Dreyfus and Dreyfus claim, “Currently, no details of this
mechanism are understood or even hypothesized in a waythat could guide AI research.” Infact, the
field of active vision, underpinned by the theory of information value , is concernedwith
exactly the problem of directing sensors, and already some robots have incorporated the
theoretical results obtained

STRONG AI: CAN MACHINES REALLY THINK?


Many philosophers have claimed that a machine that passes the Turing Test would still not be actually
thinking, but would be only a simulation of thinking. Again, the objection was foreseen by Turing. He
cites a speech by Professor Geoffrey Jefferson (1949):
Not until a machine could write a sonnet or compose a concerto because of thoughts and emotions
felt, and not by the chance fall of symbols, could we agree that machine equals brain—that is, not
only write it but know that it had written it.
Turing calls this the argument from consciousness—the machine has to be aware of its ownmental states
and actions. While consciousness is an important subject, Jefferson’s key point actually relates to
phenomenology, or the study of direct experience: the machine has to actually feel emotions. Others focus
on intentionality—that is, the question of whether the machine’s purported beliefs, desires, and other
representations are actually “about” some- thing in the real world.

154
Turing argues that Jefferson would be willing to extend the polite convention to ma- chines if
only he had experience with ones that act intelligently. He cites the following dialog, which has
become such a part of AI’s oral tradition that we simply have to include it:
HUMAN: In the first line of your sonnet which reads “shall I compare thee to a summer’s day,”
would not a “spring day” do as well or better?
MACHINE: It wouldn’t scan.
HUMAN: How about “a winter’s day.” That would scan all right.
MACHINE: Yes, but nobody wants to be compared to a winter’s day.
HUMAN: Would you say Mr. Pickwick reminded you of Christmas?
MACHINE: In a way.
HUMAN: Yet Christmas is a winter’s day, and I do not think Mr. Pickwick would mind the
comparison.
MACHINE: I don’t think you’re serious. By a winter’s day one means a typical winter’s day,
rather than a special one like Christmas.

Mental states and the brain in a vat


Physicalist philosophers have attempted to explicate what it means to say that a person—and,by extension,
a computer—is in a particular mental state. They have focused in particular on intentional states. These
are states, such as believing, knowing, desiring, fearing, and so on,that refer to some aspect of the external
world. For example, the knowledge that one is eatinga hamburger is a belief about the hamburger and what
is happening to it.
If physicalism is correct, it must be the case that the proper description of a person’s mental state is
determined by that person’s brain state. Thus, if I am currently focused on eating a hamburger in a mindful
way, my instantaneous brain state is an instance of the class of mental states “knowing that one is eating a
hamburger.” Of course, the specific configurations of all the atoms of my brain are not essential: there are
many configurations of my brain, orof other people’s brain, that would belong to the same class of mental
states. The key point is that the same brain state could not correspond to a fundamentally distinct mental
state, suchas the knowledge that one is eating a banana.
The “wide content” view interprets it from the point of view of an omniscient outside observer with
access to the whole situation, who can distinguish differences in the world. Under this view, the content of
mental states involves both the brain state and the environment history. Narrow content, on the other
hand, considers only the brain state. The narrow content of the brain states of a real hamburger-eater and
a brain-in-a-vat “hamburger”-“eater” is the same in both cases.

Functionalism and the brain replacement experiment


The theory of functionalism says that a mental state is any intermediate causal condition between input
and output. Under functionalist theory, any two systems with isomorphic causal processes would have
the same mental states. Therefore, a computer program could have the same mental states as a person. Of
course, we have not yet said what “isomorphic” really means, but the assumption is that there is some level
of abstraction below which the specific implementation does not matter.
And this explanation must also apply to the real brain, which has the same functional properties.
There are three possible conclusions:

155
1. The causal mechanisms of consciousness that generate these kinds of outputs in normalbrains are still
operating in the electronic version, which is therefore conscious.
2. The conscious mental events in the normal brain have no causal connection to behavior, and are
missing from the electronic brain, which is therefore not conscious.
3. The experiment is impossible, and therefore speculation about it is meaningless.
Biological naturalism and the Chinese Room
A strong challenge to functionalism has been mounted by John Searle’s (1980) biological naturalism,
according to which mental states are high-level emergent features that are caused by low-level physical
processes in the neurons, and it is the (unspecified) properties of the neurons that matter. Thus, mental
states cannot be duplicated just on the basis of some pro- gram having the same functional structure with
the same input–output behavior; we would require that the program be running on an architecture with the
same causal power as neurons. To support his view, Searle describes a hypothetical system that is clearly
running a program and passes the Turing Test, but that equally clearly (according to Searle) does not
understand anything of its inputs and outputs. His conclusion is that running the appropriate program (i.e.,
having the right outputs) is not a sufficient condition for being a mind.
So far, so good. But from the outside, we see a system that is taking input in the form of Chinese sentences
and generating answers in Chinese that are as “intelligent” as those in the conversation imagined by
Turing.4 Searle then argues: the person in the room doesnot understand Chinese (given). The rule book
and the stacks of paper, being just pieces ofpaper, do not understand Chinese. Therefore, there is no
understanding of Chinese. Hence, according to Searle, running the right program does not necessarily
generate understanding.
The real claim made by Searle rests upon the following four axioms :
1. Computer programs are formal (syntactic).
2. Human minds have mental contents (semantics).
3. Syntax by itself is neither constitutive of nor sufficient for semantics.
4. Brains cause minds.
From the first three axioms Searle concludes that programs are not sufficient for minds. In other words, an
agent running a program might be a mind, but it is not necessarily a mind justby virtue of running the
program. From the fourth axiom he concludes “Any other system capable of causing minds would have to
have causal powers (at least) equivalent to thoseof brains.” From there he infers that any artificial brain
would have to duplicate the causal powers of brains, not just run a particular program, and that human
brains do not produce mental phenomena solely by virtue of running a program.

Consciousness, qualia, and the explanatory gap


Running through all the debates about strong AI—the elephant in the debating room, soto speak—is
the issue of consciousness. Consciousness is often broken down into aspects such as understanding and
self-awareness. The aspect we will focus on is that of subjective experience: why it is that it feels like
something to have certain brain states (e.g., while eatinga hamburger), whereas it presumably does not feel
like anything to have other physical states (e.g., while being a rock). The technical term for the intrinsic
nature of experiences is qualia(from the Latin word meaning, roughly, “such things”).
Qualia present a challenge for functionalist accounts of the mind because different qualia could beinvolved
in what are otherwise isomorphic causal processes. Consider, for example, the inverted spectrum

156
thought experiment, which the subjective experience of per- son X when seeing red objects is the same
experience that the rest of us experience when seeing green objects, and vice versa.
This explanatory gap has led some philosophers to conclude that humans are simply incapable of
forming a proper understanding of their own consciousness. Others, notably Daniel Dennett (1991), avoid
the gap by denying the existence of qualia,attributing them to a philosophical confusion.
THE ETHICS AND RISKS OF DEVELOPING ARTIFICIAL INTELLIGENCE
So far, we have concentrated on whether we can develop AI, but we must also consider whether we should.
If the effects of AI technology are more likely to be negative than positive, then it would be the moral
responsibility of workers in the field to redirect their research. Many new technologies have had
unintended negative side effects: nuclear fission brought Chernobyl and the threat of global destruction;
the internal combustion engine brought air pollution, global warming, and the paving-over of paradise. In
a sense, automobiles are robots that have conquered the world by making themselves indispensable.
AI, however, seems to pose some fresh problems beyond that of, say, building bridges that don’t fall
down:
• People might lose their jobs to automation.
• People might have too much (or too little) leisure time.
• People might lose their sense of being unique.
• AI systems might be used toward undesirable ends.
• The use of AI systems might result in a loss of accountability.
• The success of AI might mean the end of the human race.

People might lose their jobs to automation. The modern industrial economy has be come dependent on
computers in general, and select AI programs in particular. For example, much of the economy, especially
in the United States, depends on the availability of con- sumer credit. Credit card applications, charge
approvals, and fraud detection are now done by AI programs. One could say that thousands of workers
have been displaced by these AI programs, but in fact if you took away the AI programs these jobs would
not exist, because human labor would add an unacceptable cost to the transactions.

People might lose their sense of being unique. In Computer Power and Human Rea- son, Weizenbaum
(1976), the author of the ELIZA program, points out some of the potential threats that AI poses to society.
One of Weizenbaum’s principal arguments is that AI research makes possible the idea that humans are
automata—an idea that results in a loss of autonomyor even of humanity.

AI systems might be used toward undesirable ends. Advanced technologies have often been used by the
powerful to suppress their rivals. As the number theorist G. H. Hardy wrote (Hardy, 1940), “A science is
said to be useful if its development tends to accentuate the existing inequalities in the distribution of wealth,
or more directly promotes the destruction of human life.” This holds for all sciences, AI being no exception.
Autonomous AI systems are now commonplace on the battlefield; the U.S. military deployed over 5,000
autonomousaircraft and 12,000 autonomous ground vehicles in Iraq (Singer, 2009).

The use of AI systems might result in a loss of accountability. In the litigious atmo-sphere that prevails in
the United States, legal liability becomes an important issue. When aphysician relies on the judgment ofa
medical expert system for a diagnosis, who is at fault ifthe diagnosis is wrong? Fortunately, due in part to the
growing influence of decision-theoreticmethods in medicine, it is now accepted that negligence cannot

157
be shown if the physician performs medical procedures that have high expected utility, even if the actual
result is catastrophic for the patient.
The success of AI might mean the end of the human race. Almost any technology has the potential to
cause harm in the wrong hands, but with AI and robotics, we have the new problem that the wrong hands
might belong to the technology itself. Countless science fictionstories have warned about robots or robot–
human cyborgs running amok.
If ultra intelligent machines are a possibility, we humans would do well to make sure that we design
their predecessors in such a way that they design themselves to treat us well. Science fiction writer Isaac
Asimov (1942) was the first to address this issue, with his threelaws of robotics:
1. A robot may not injure a human being or, through inaction, allow a human being to come to harm.
2. A robot must obey orders given to it by human beings, except where such orders wouldconflict with
the First Law.
A robot must protect its own existence as long as such protection does not conflict withthe First or Second
Law
AGENT COMPONENTS
Interaction with the environment through sensors and actuators: For much of thehistory of AI, this
has been a glaring weak point. With a few honorable exceptions, AI sys- tems were built in such a way
that humans had to supply the inputs and interpret the outputs,
ri ad
em
H

Figure A model-based, utility-based agent


while robotic systems focused on low-level tasks in which high-level reasoning and plan- ning were largely
absent. This was due in part to the great expense and engineering effort required to get real robots to work
at all. The situation has changed rapidly in recent years with the availability of ready-made programmable
robots. These, in turn, have benefited from small, cheap, high-resolution CCD cameras andcompact, reliable
motor drives. MEMS (micro-electromechanical systems) technology has supplied miniaturized
accelerometers, gy- roscopes, and actuators for an artificial flying insect (Floreano et al., 2009). It may also
be possible to combine millions of MEMS devices to produce powerful macroscopic actuators.
Keeping track of the state of the world: This is one of the core capabilities required for an intelligent
agent. It requires both perception and updating of internal representations. showed how to keep track of
atomic state representations, described how to do it for factored (propositional) state representations
extended this to first-order logic; and Chapter 15 described filtering algorithms for probabilistic reasoning

158
inuncertain environments. Current filtering and perception algorithms can be combined to do areasonable
job of reporting low-level predicates such as “the cup is on the table.” Detectinghigher-level actions, such
as “Dr. Russell is having a cup of tea with Dr. Norvig while dis- cussing plans for next week,” is more
difficult. Currently it can be done only with the help of annotated examples.
Projecting, evaluating, and selecting future courses of action: The basic knowledge- representation
requirements here are the same as for keeping track of the world; the primary difficulty is coping with
courses of action—such as having a conversation or a cup of tea—that consist eventually of thousands or
millions of primitive steps for a real agent. It is only by imposing hierarchical structure on behavior that
we humans cope at all.how to use hierarchical representations to handle problems of this scale; fur- ther
more, work in hierarchical reinforcement learning has succeeded in combining someof these ideas with
the techniques for decision making under uncertainty described in. As yet, algorithms for the partially
observable case (POMDPs) are using the same atomic state representation we used for the search algorithms

It has proven very difficult to decomposepreferences over complex states in the same way that Bayes nets
decompose beliefs over complex states. One reason may be that preferences over states are really compiled
from preferences over state histories, which are described by reward functions

Learning: Chapters 18 to 21 described how learning in an agent can be formulated as inductive learning
(supervised, unsupervised, or reinforcement-based) of the functions that constitute the variouscomponents
of the agent. Very powerful logical and statistical tech- niques have been developed that can cope with quite
large problems, reaching or exceeding human capabilities in many tasks—as long as we are dealing with a
predefined vocabulary of features and concepts.

AGENT ARCHITECTURES
It is natural to ask, “Which of the agent architectures should an agent use?” The answer is, “All of them!”
We have seen that reflex responses are needed for situations in which time is of the essence, whereas
knowledge-based deliberation allows the agent to plan ahead. A complete agent must be able to do both,
using a hybrid architecture. One important property of hybrid architectures is that the boundariesbetween
different decision components are not fixed. For example, compilation continually converts declarative in-
formation at the deliberative level into more efficient representations, eventually reaching the reflex level.
For example, a taxi-driving agent that sees an accident ahead must decide in a split second either to brake
or to take evasive action. It should also spend that split second thinking about the most important questions,
such as whether the lanes to the left and right are clear and whether there is a large truck close behind, rather
than worrying about wear and tear on the tires or where to pick up the next passenger. These issues are
usually studied under the heading of real-time AI

159
Fig: Compilation serves to convert deliberative decision making into more efficient, reflexive
mechanisms. Clearly, there is a pressing need for general methods of controlling deliberation, ratherthan specific
recipes for what to think about in each situation. The first useful idea is to employ anytime algorithms
The second technique for controlling deliberation is decision-theoretic meta reasoning (Russell and
Wefald, 1989, 1991; Horvitz, 1989; Horvitz and Breese, 1996). This method applies the theory of
information value to the selection of individual computations. The value of a computation depends on both
its cost (in terms of delaying action) andits benefits (in terms of improved decision quality). Meta reasoning
techniques can be used to design better search algorithms and to guarantee that the algorithms have the
anytime property. Meta reasoning is expensive, of course, and compilation methods can be applied so that
the overhead is small compared to the costs of the computations being controlled. Meta level reinforcement
learning may provide another way to acquire effective policies for controllingdeliberation

Meta reasoning is one specific example of a reflective architecture—that is, an architecture that enables
deliberation about the computational entities and actions occurring withinthe architecture itself. A theoretical
foundation for reflective architectures can be built by defining a joint state space composed from the
environment state and the computational stateof the agent itself.

ARE WE GOING IN THE RIGHT DIRECTION?

The preceding section listed many advances and many opportunities for further progress. Butwhere is this
all leading? Dreyfus (1992) gives the analogy of trying to get to the moon by climbing a tree; one can
report steady progress, all the way to the top of the tree. In this section, we consider whether AI’s current
path is more like a tree climb or a rocket trip.

Perfect rationality. A perfectly rational agent acts at every instant in such a way as to maximize its
expected utility, given the information it has acquired from the environment. We have seen that the
calculations necessary to achieve perfect rationality in most environments are too time consuming, so
perfect rationality is not a realistic goal.
Calculative rationality. This is the notion of rationality that we have used implicitly in de- signing logical
and decision-theoretic agents, and most of theoretical AI research has focusedon this property. A
calculatively rational agent eventually returns what would have been therational choice at the beginning of
its deliberation. This is an interesting property for a system to exhibit, but in most environments, the right
answer at the wrong time is of no value. In practice, AI system designers are forced to compromise on
decision quality to obtain reason- able overall performance; unfortunately, the theoretical basis of
calculative rationality does not provide a well-founded way to make such compromises.

160
Bounded rationality. Herbert Simon (1957) rejected the notion of perfect (or even approxiimately perfect)
rationality and replaced it with bounded rationality, a descriptive theory of decision making by real agents.
Bounded optimality (BO). A bounded optimal agent behaves as well as possible, given its computational
resources. That is, the expected utility of the agent program for a bounded optimal agent is at least as high
as the expected utility of any other agent program running onthe same machine.

WHAT IF AI DOES SUCCEED?

In David Lodge’s Small World (1984), a novel about the academic world of literary criticism, the
protagonist causes consternation by asking a panel of eminent but contradictory literary theorists the
following question: “What if you were right?” None of the theorists seems to have considered this question
before, perhaps because debating unfalsifiable theories is an end in itself. Similar confusion can be evoked
by asking AI researchers, “What if you succeed?”
We can expect that medium-level successes in AI would affect all kinds of people in their daily lives.
So far, computerized communication networks, such as cell phones and the Internet, have had this kind of
pervasive effect on society, but AI has not. AI has been at workbehind the scenes—for example, in
automatically approving or denying credit card transactions for every purchase made on the Web—but has
not been visible to the average consumer.We can imagine that truly useful personal assistants for the office
or the home would have a large positive impact on people’s lives, although they might cause some economic
dislocation in the short term. Automated assistants for driving could prevent accidents, saving tens of
thousands of lives per year. A technological capability at this level might also be applied to the development
of autonomous weapons, which many view as undesirable. Some of the biggest societal problems we face
today—such as the harnessing of genomic information fortreating disease, the efficient management of
energy resources, and the verification of treaties concerning nuclear weapons—are being addressed with the
help of AI technologies.
Finally, it seems likely that a large-scale success in AI—the creation of human-level in-telligence and
beyond—would change the lives of a majority of humankind. The very natureof our work and play would
be altered, as would our view of intelligence, consciousness, andthe future destiny of the human race. AI
systems at this level of capability could threaten hu- man autonomy, freedom, and even survival. For these
reasons, we cannot divorce AI researchfrom its ethical consequences
In conclusion, we see that AI has made great progress in its short history, but the final sentence of Alan
Turing’s (1950) essay on Computing Machinery and Intelligence is still valid today:
We can see only a short distance ahead, but we can see that much remains to be done.

161
PART –A
1. Explain briefly Robot Perception
2. Distinguish between Weak AI and Strong AI.
3. Explain about Agent Architectures.
4. What if AI does succeed.

PART –B

1. Explain briefly about Robotics


2. Write and Explain a Robot Hardware.
3. What is Robotic Perception? Explain briefly.
4. Explain about Robotic Software architecture

162

You might also like