Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views91 pages

Artificial Intelligence Watermark

Uploaded by

charansewar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views91 pages

Artificial Intelligence Watermark

Uploaded by

charansewar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

UNIT – 1

Different Approach of AI

Artificial defines “man-made”.


Intelligence defines “thinking power”.
So, AI means “a man-made thinking power”.
Ai is a branch of computer science by which we can create intelligent machines which can behave like a human,
think like humans, and able to make decisions.
It is the science and engineering of making intelligent machines, especially intelligent computer programs.
There are four types of artificial intelligence approaches based on how machines behave - reactive machines, limited
memory, theory of mind, and self-awareness.

1. Reactive machines:
These machines are the most basic form of AI applications.
AI teams do not use training sets to feed the machines or store subsequent data for future references. Based on the
move made by the opponent, the machine decides/predicts the next move.

2. Limited memory:
Over time, these machines are fed with data and trained on the speed and direction of other cars, lane markings,
traffic lights, curves of roads, and other important factors.

3. Theory of mind:
It is where we are struggling to make this concept work. However, we are not there yet. Theory of mind is the
concept where bots will understand and react to human emotions, thoughts.

4. Self-awareness:
It is a step ahead of understanding human emotions. It is the stage where AI teams build machines with self-
awareness factors programmed into them.

❖ Advantages of artificial intelligence: -


1. High accuracy with less errors.
2. High speed.
3. High reliability.
4. Useful for risky areas.
5. Digital assistant.
6. Useful as public utility.

❖ Disadvantages of artificial intelligence: -


1. High cost.
2. Can not think out of the box.
3. No feeling and emotions.
4. Increase dependency on machines.
5. No original creativity.

2
Search Algorithms

❖ Problem-solving agents:

In Artificial Intelligence, Search techniques are universal problem-solving methods. Rational


agents or Problem-solving agents in AI mostly used these search strategies or algorithms to solve a specific
problem and provide the best result.

❖ Search Algorithm Terminologies:


• Search: Searching is a step-by-step procedure to solve a search-problem in a given search space. A search
problem can have three main factors:
o Search Space: Search space represents a set of possible solutions, which a system may have.
o Start State: It is a state from where agent begins the search.
o Goal test: It is a function which observe the current state and returns whether the goal state is
achieved or not.
• Search tree: A tree representation of search problem is called Search tree. The root of the search tree is the
root node which is corresponding to the initial state.
• Actions: It gives the description of all the available actions to the agent.
• Transition model: A description of what each action do, can be represented as a transition model.
• Path Cost: It is a function which assigns a numeric cost to each path.
• Solution: It is an action sequence which leads from the start node to the goal node.
• Optimal Solution: If a solution has the lowest cost among all solutions.

❖ Properties of Search Algorithms:

Following are the four essential properties of search algorithms to compare the efficiency of these algorithms:

Completeness: A search algorithm is said to be complete if it guarantees to return a solution if at least any
solution exists for any random input.

Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest path cost) among
all other solutions, then such a solution for is said to be an optimal solution.

Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.

Space Complexity: It is the maximum storage space required at any point during the search, as the complexity
of the problem.

3
❖ Types of search algorithms:

Based on the search problems we can classify the search algorithms into uninformed (Blind search)
search and informed search (Heuristic search) algorithms.

4
Breadth First Search

It is the most common search strategy to traversing a tree or graph.


This algorithm searches breadthwise in a tree or graph so it is called breadth-first search.
BFS algorithm starts searching from the root node of the tree and expands all success node at the current level
before moving to node of next level.
BFS algorithm is an example of general graph search algorithm.
BFS implemented using FIFO queue data structure.
BFS algorithm starts searching from the root node of the tree and expands all successor node at the current level
before moving to nodes of next level.
The breadth-first search algorithm is an example of a general-graph search algorithm.
Breadth-first search implemented using FIFO queue data structure.
❖ Advantages:
• BFS will provide a solution if any solution exists.
• If there are more than one solution for a given problem, then BFS will provide the minimal solution
which requires the least number of steps.
❖ Disadvantages:
• It requires lots of memory since each level of the tree must be saved into memory to expand the next
level.
• BFS needs lots of time if the solution is far away from the root node.

➢ Example:

BFS algorithm from the root node S to goal node K. BFS search algorithm traverses in layers.

S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

5
Depth First Search

It is a recursive algorithm for traversing a tree or graph data structure.


It is called DFS because it starts from the root and follows each path to its greatest depth node before moving to the
next path.
DFS uses a stack data structure for its implementation.
The process is similar to BFS algorithm.
Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
It is called the depth-first search because it starts from the root node and follows each path to its greatest depth node
before moving to the next path.
DFS uses a stack data structure for its implementation.
The process of the DFS algorithm is similar to the BFS algorithm.
❖ Advantage:
• DFS requires very less memory as it only needs to store a stack of the nodes on the path from root node to
the current node.
• It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right path).
❖ Disadvantage:
• There is the possibility that many states keep re-occurring, and there is no guarantee of finding the solution.
• DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.

➢ Example:
The flow of depth-first search order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after traversing E, it will
backtrack the tree as E has no other successor and still goal node is not found. After backtracking it will
traverse node C and then G, and here it will terminate as it found goal node.

6
Iterative Deepening

The iterative deepening algorithm is a combination of DFS and BFS algorithms.


This algorithm performs depth-first search up to a certain "depth limit", and it keeps increasing the depth limit after
each iteration until the goal node is found.
The iterative search algorithm is useful uninformed search when search space is large, and depth of goal node is
unknown.

❖ Advantages:
• It combines the benefits of BFS and DFS search algorithm in terms of fast search and memory efficiency.

❖ Disadvantages:
• The main drawback of IDDFS is that it repeats all the work of the previous phase.

➢ Example:

1'st Iteration-----> A
2'nd Iteration----> A, B, C
3'rd Iteration------>A, B, D, E, C, F, G
4'th Iteration------>A, B, D, H, I, E, C, F, K, G
In the fourth iteration, the algorithm will find the goal node.

7
Bi-directional Search

Bidirectional search algorithm runs two simultaneous searches, one form initial state called as forward-search and
other from goal node called as backward-search, to find the goal node. Bidirectional search replaces one single
search graph with two small subgraphs in which one starts the search from an initial vertex and other starts from
goal vertex. The search stops when these two graphs intersect each other.
Bidirectional search can use search techniques such as BFS, DFS, DLS, etc.

❖ Advantages:
• Bidirectional search is fast.
• Bidirectional search requires less memory

❖ Disadvantages:
• Implementation of the bidirectional search tree is difficult.
• In bidirectional search, one should know the goal state in advance.

➢ Example:

The algorithm terminates at node 9 where two searches meet.

8
Hill Climbing Algorithm in Artificial Intelligence
Hill climbing algorithm is a technique which is used for optimizing the mathematical problems. One of the widely
discussed examples of Hill climbing algorithm is Traveling-salesman Problem in which we need to minimize the
distance traveled by the salesman.
It is also called greedy local search as it only looks to its good immediate neighbor state and not beyond that.
A node of hill climbing algorithm has two components which are state and value.
Hill Climbing is mostly used when a good heuristic is available.
In this algorithm, we don't need to maintain and handle the search tree or graph as it only keeps a single current
state.

❖ Features of Hill Climbing:


Some main features of Hill Climbing Algorithm:
o Generate and Test variant: Hill Climbing is the variant of Generate and Test method. The Generate
and Test method produce feedback which helps to decide which direction to move in the search space.
o Greedy approach: Hill-climbing algorithm search moves in the direction which optimizes the cost.
o No backtracking: It does not backtrack the search space, as it does not remember the previous states.

❖ State-space Diagram for Hill Climbing:


The state-space landscape is a graphical representation of the hill-climbing algorithm which is showing a graph
between various states of algorithm and Objective function/Cost.
On Y-axis we have taken the function which can be an objective function or cost function, and state-space on
the x-axis. If the function on Y-axis is cost then, the goal of search is to find the global minimum and local
minimum. If the function of Y-axis is Objective function, then the goal of the search is to find the global
maximum and local maximum.

9
❖ Different regions in the state space landscape:
Local Maximum: Local maximum is a state which is better than its neighbour states, but there is also another
state which is higher than it.
Global Maximum: Global maximum is the best possible state of state space landscape. It has the highest value
of objective function.
Current state: It is a state in a landscape diagram where an agent is currently present.
Flat local maximum: It is a flat space in the landscape where all the neighbor states of current states have the
same value.
Shoulder: It is a plateau region which has an uphill edge.

❖ Types of Hill Climbing Algorithm:


1. Simple Hill Climbing:
Simple hill climbing is the simplest way to implement a hill climbing algorithm. It only evaluates the
neighbour node state at a time and selects the first one which optimizes current cost and set it as a
current state. It only checks it's one successor state, and if it finds better than the current state, then move else
be in the same state. This algorithm has the following features:
o Less time consuming
o Less optimal solution and the solution is not guaranteed
➢ Algorithm for Simple Hill Climbing:
o Step 1: Evaluate the initial state, if it is goal state then return success and stop.
o Step 2: Loop Until a solution is found or there is no new operator left to apply.
o Step 3: Select and apply an operator to the current state.
o Step 4: Check new state:
1. If it is goal state, then return success and quit.
2. Else if it is better than the current state then assigns new state as a current state.
3. Else if not better than the current state, then return to step2.
o Step 5: Exit.
2. Steepest-Ascent hill climbing:
The steepest-Ascent algorithm is a variation of simple hill climbing algorithm. This algorithm examines all the
neighbouring nodes of the current state and selects one neighbour node which is closest to the goal state. This
algorithm consumes more time as it searches for multiple neighbours.
➢ Algorithm for Steepest-Ascent hill climbing:
o Step 1: Evaluate the initial state, if it is goal state then return success and stop, else make current state as
initial state.
o Step 2: Loop until a solution is found or the current state does not change.
1. Let SUCC be a state such that any successor of the current state will be better than it.
2. For each operator that applies to the current state:
I. Apply the new operator and generate a new state.
II. Evaluate the new state.
III. If it is goal state, then return it and quit, else compare it to the SUCC.
IV. If it is better than SUCC, then set new state as SUCC.
V. If the SUCC is better than the current state, then set current state to SUCC.
o Step 5: Exit.
3. Stochastic hill climbing:
Stochastic hill climbing does not examine for all its neighbour before moving. Rather, this search algorithm selects
one neighbour node at random and decides whether to choose it as a current state or examine another state.

10
❖ Problems in Hill Climbing Algorithm:
1. Local Maximum: A local maximum is a peak state in the landscape which is better than each of its neighbouring
states, but there is another state also present which is higher than the local maximum.
Solution: Backtracking technique can be a solution of the local maximum in state space landscape. Create a list of
the promising path so that the algorithm can backtrack the search space and explore other paths as well.

2. Plateau: A plateau is the flat area of the search space in which all the neighbour states of the current state
contains the same value, because of this algorithm does not find any best direction to move. A hill-climbing search
might be lost in the plateau area.
Solution: The solution for the plateau is to take big steps or very little steps while searching, to solve the problem.
Randomly select a state which is far away from the current state so it is possible that the algorithm could find non-
plateau region.

3. Ridges: A ridge is a special form of the local maximum. It has an area which is higher than its surrounding areas,
but itself has a slope, and cannot be reached in a single move.
Solution: With the use of bidirectional search, or by moving in different directions, we can improve this problem.

❖ Simulated Annealing:
A hill-climbing algorithm which never makes a move towards a lower value guaranteed to be incomplete because it
can get stuck on a local maximum. And if algorithm applies a random walk, by moving a successor, then it may
complete but not efficient. Simulated Annealing is an algorithm which yields both efficiency and completeness.
In mechanical term Annealing is a process of hardening a metal or glass to a high temperature then cooling
gradually, so this allows the metal to reach a low-energy crystalline state. The same process is used in simulated
annealing in which the algorithm picks a random move, instead of picking the best move. If the random move
improves the state, then it follows the same path. Otherwise, the algorithm follows the path which has a probability
of less than 1 or it moves downhill and chooses another path.

11
Informed Search Algorithms
Informed search algorithm contains an array of knowledge such as how far we are from the goal, path cost, how to
reach to goal node, etc. This knowledge help agents to explore less to the search space and find more efficiently the
goal node.
The informed search algorithm is more useful for large search space. Informed search algorithm uses the idea of
heuristic, so it is also called Heuristic search.
Heuristics function: Heuristic is a function which is used in Informed Search, and it finds the most promising path.
It takes the current state of the agent as its input and produces the estimation of how close agent is from the goal.
The heuristic method, however, might not always give the best solution, but it guaranteed to find a good solution in
reasonable time. Heuristic function estimates how close a state is to the goal. It is represented by h(n), and it
calculates the cost of an optimal path between the pair of states. The value of the heuristic function is always
positive.

Admissibility of the heuristic function is given as:


1. h(n) <= h*(n)

Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or equal to
the estimated cost.

12
Heuristic search techniques in AI

❖ We can perform the Heuristic techniques into two categories:


1. Direct Heuristic Search techniques in AI:
It includes Blind Search, Uninformed Search, and Blind control strategy. These search techniques are not
always possible as they require much memory and time. These techniques search the complete space for a
solution and use the arbitrary ordering of operations.
The examples of Direct Heuristic search techniques include Breadth-First Search (BFS) and Depth First
Search (DFS).

2. Weak Heuristic Search techniques in AI:


It includes Informed Search, Heuristic Search, and Heuristic control strategy. These techniques are helpful
when they are applied properly to the right types of tasks. They usually require domain-specific
information.
The examples of Weak Heuristic search techniques include Best First Search (BFS) and A*.

❖ Some of the real-life examples of heuristics that people use as a way to solve a problem:
o Common sense: It is a heuristic that is used to solve a problem based on the observation of an individual.
o Rule of thumb: In heuristics, we also use a term rule of thumb. This heuristic allows an individual to make
an approximation without doing an exhaustive search.
o Working backward: It lets an individual solve a problem by assuming that the problem is already being
solved by them and working backward in their minds to see how much a solution has been reached.
o Availability heuristic: It allows a person to judge a situation based on the examples of similar situations
that come to mind.

13
o Familiarity heuristic: It allows a person to approach a problem on the fact that an individual is familiar
with the same situation, so one should act similarly as he/she acted in the same situation before.
o Educated guess: It allows a person to reach a conclusion without doing an exhaustive search. Using it, a
person considers what they have observed in the past and applies that history to the situation where there is
not any definite answer has decided yet.

❖ Types of heuristics:
There are various types of heuristics, including the availability heuristic, affect heuristic and representative
heuristic. Each heuristic type plays a role in decision-making.
1. Availability heuristic:
Availability heuristic is said to be the judgment that people make regarding the likelihood of an event based
on information that quickly comes into mind. On making decisions, people typically rely on the past
knowledge or experience of an event. It allows a person to judge a situation based on the examples of
similar situations that come to mind.
2. Representative heuristic:
It occurs when we evaluate an event's probability on the basis of its similarity with another event.
Example: We can understand the representative heuristic by the example of product packaging, as
consumers tend to associate the products quality with the external packaging of a product. If a company
packages its products that remind you of a high quality and well-known product, then consumers will relate
that product as having the same quality as the branded product.
So, instead of evaluating the product based on its quality, customers correlate the products quality based on
the similarity in packaging.
3. Affect heuristic:
It is based on the negative and positive feelings that are linked with a certain stimulus. It includes quick
feelings that are based on past beliefs. Its theory is one's emotional response to a stimulus that can affect the
decisions taken by an individual.
When people take a little time to evaluate a situation carefully, they might base their decisions based on
their emotional response.
Example: The affect heuristic can be understood by the example of advertisements. Advertisements can
influence the emotions of consumers, so it affects the purchasing decision of a consumer. The most
common examples of advertisements are the ads of fast food. When fast-food companies run the
advertisement, they hope to obtain a positive emotional response that pushes you to positively view their
products.
If someone carefully analyses the benefits and risks of consuming fast food, they might decide that fast
food is unhealthy. But people rarely take time to evaluate everything they see and generally make decisions
based on their automatic emotional response. So, Fast food companies present advertisements that rely on
such type of Affect heuristic for generating a positive emotional response which results in sales.

❖ Limitation of heuristics:
Along with the benefits, heuristic also has some limitations.
o Although heuristics speed up our decision-making process and also help us to solve problems, they can
also introduce errors just because something has worked accurately in the past, so it does not mean that
it will work again.
o It will hard to find alternative solutions or ideas if we always rely on the existing solutions or
heuristics.

14
A* Algorithm

It is a searching algorithm that is used to find the shortest path between an initial and a final point.
It is a handy algorithm that is often used for map traversal to find the shortest path to be taken. A* was initially
designed as a graph traversal problem, to help build a robot that can find its own course. It still remains a widely
popular algorithm for graph traversal.
It searches for shorter paths first, thus making it an optimal and complete algorithm. An optimal algorithm will find
the least cost outcome for a problem, while a complete algorithm finds all the possible outcomes of a problem.
Another aspect that makes A* so powerful is the use of weighted graphs in its implementation. A weighted graph
uses numbers to represent the cost of taking each path or course of action. This means that the algorithms can take
the path with the least cost, and find the best route in terms of distance and time.

A major drawback of the algorithm is its space and time complexity. It takes a large amount of space to store all
possible paths and a lot of time to find them.

15
❖ Why A* Search Algorithm?
A* Search Algorithm is a simple and efficient search algorithm that can be used to find the optimal path
between two nodes in a graph. It will be used for the shortest path finding. It is an extension of Dijkstra’s
shortest path algorithm (Dijkstra’s Algorithm). The extension here is that, instead of using a priority queue to
store all the elements, we use heaps (binary trees) to store them. The A* Search Algorithm also uses a heuristic
function that provides additional information regarding how far away from the goal node we are. This function
is used in conjunction with the f-heap data structure in order to make searching more efficient.

Explanation:
In the event that we have a grid with many obstacles and we want to get somewhere as rapidly as possible, the
A* Search Algorithms are our savior. From a given starting cell, we can get to the target cell as quickly as
possible. It is the sum of two variables’ values that determines the node it picks at any point in time.
At each step, it picks the node with the smallest value of ‘f’ (the sum of ‘g’ and ‘h’) and processes that
node/cell. ‘g’ and ‘h’ is defined as simply as possible below:

• ‘g’ is the distance it takes to get to a certain square on the grid from the starting point, following
the path we generated to get there.
• ‘h’ is the heuristic, which is the estimation of the distance it takes to get to the finish line from that
square on the grid.
Heuristics are basically educated guesses. It is crucial to understand that we do not know the distance to the
finish point until we find the route since there are so many things that might get in the way (e.g., walls, water,
etc.). In the coming sections, we will dive deeper into how to calculate the heuristics.

❖ Algorithm:
Initial condition - we create two lists - Open List and Closed List.
Now, the following steps need to be implemented -

• The open list must be initialized.


• Put the starting node on the open list (leave its f at zero). Initialize the closed list.
• Follow the steps until the open list is non-empty:

1. Find the node with the least f on the open list and name it “q”.
2. Remove Q from the open list.
3. Produce q's eight descendants and set q as their parent.
4. For every descendant:
i) If finding a successor is the goal, cease looking
ii)Else, calculate g and h for the successor.
successor.g = q.g + the calculated distance between the successor and the q.
successor.h = the calculated distance between the successor and the goal. We will cover three heuristics to do
this: the Diagonal, the Euclidean, and the Manhattan heuristics.
successor.f = successor.g plus successor.h
iii) Skip this successor if a node in the OPEN list with the same location as it but a lower f value than the
successor is present.
iv) Skip the successor if there is a node in the CLOSED list with the same position as the successor but a lower
f value; otherwise, add the node to the open list end (for loop).

• Push Q into the closed list and end the while loop.
We will now discuss how to calculate the Heuristics for the nodes.

16
❖ How Does the A* Algorithm Work?

Consider the weighted graph depicted above, which contains nodes and the distance between them. Let's
say you start from A and have to go to D.
Now, since the start is at the source A, which will have some initial heuristic value. Hence, the results are
f(A) = g(A) + h(A)
f(A) = 0 + 6 = 6
Next, take the path to other neighbouring vertices:
f(A-B) = 1 + 4
f(A-C) = 5 + 2
Now take the path to the destination from these nodes, and calculate the weights:
f(A-B-D) = (1+ 7) + 0
f(A-C-D) = (5 + 10) + 0
It is clear that node B gives you the best path, so that is the node you need to take to reach the destination.

17
❖ Pseudocode of A* Algorithm:
The text below represents the pseudocode of the Algorithm. It can be used to implement the algorithm in
any programming language and is the basic logic behind the Algorithm.
• Make an open list containing starting node
• If it reaches the destination node:
• Make a closed empty list
• If it does not reach the destination node, then consider a node with the lowest f-score
in the open list We are finished
• Else :
Put the current node in the list and check its neighbors
• For each neighbor of the current node :
• If the neighbor has a lower g value than the current node and is in the closed list:
Replace neighbor with this new node as the neighbor’s parent
• Else If (current g is lower and neighbor is in the open list):
Replace neighbor with the lower g value and change the neighbor’s parent to the current node.
• Else If the neighbor is not in both lists:
Add it to the open list and set its g

18
AO* algorithm – Artificial intelligence
The Depth first search and Breadth first search given earlier for OR trees or graphs can be easily adopted by AND-
OR graph. The main difference lies in the way termination conditions are determined, since all goals following an
AND nodes must be realized; where as a single goal node following an OR node will do. So for this purpose we are
using AO* algorithm.

❖ Like A* algorithm here we will use two arrays and one heuristic function:
1. OPEN:
It contains the nodes that has been traversed but yet not been marked solvable or unsolvable.
2. CLOSE:
It contains the nodes that have already been processed.

❖ Algorithm:
Step 1: Place the starting node into OPEN.
Step 2: Compute the most promising solution tree say T0.
Step 3: Select a node n that is both on OPEN and a member of T0. Remove it from OPEN and place it in
CLOSE
Step 4: If n is the terminal goal node then leveled n as solved and leveled all the ancestors of n as solved. If the
starting node is marked as solved then success and exit.
Step 5: If n is not a solvable node, then mark n as unsolvable. If starting node is marked as unsolvable, then
return failure and exit.
Step 6: Expand n. Find all its successors and find their h (n) value, push them into OPEN.
Step 7: Return to Step 2.
Step 8: Exit.

❖ Advantages:
It is an optimal algorithm.
If traverse according to the ordering of nodes. It can be used for both OR and AND graph.

❖ Disadvantages:
Sometimes for unsolvable nodes, it can’t find the optimal path. Its complexity is than other algorithms.

19
Constraint Satisfaction Problems

We have encountered a wide variety of methods, including adversarial search and instant search, to address various
issues. Every method for issue has a single purpose in mind: to locate a remedy that will enable that achievement of
the objective.
However there were no restrictions just on bots' capability to resolve issues as well as arrive at responses in
adversarial search and local search, respectively.
These section examines the constraint optimization methodology, another form or real concern method. By its name,
constraints fulfilment implies that such an issue must be solved while adhering to a set of restrictions or guidelines.
Whenever a problem is actually variables comply with stringent conditions of principles, it is said to have been
addressed using the solving multi - objective method. Wow what a method results in a study sought to achieve of the
intricacy and organization of both the issue.

❖ Three factors affect restriction compliance, particularly regarding:


o It refers to a group of parameters, or X.
o D: The variables are contained within a collection several domain. Every variables has a distinct scope.
o C: It is a set of restrictions that the collection of parameters must abide by.
In constraint satisfaction, domains are the areas wherein parameters were located after the restrictions that are
particular to the task. Those three components make up a constraint satisfaction technique in its entirety. The pair
"scope, rel" makes up the number of something like the requirement. The scope is a tuple of variables that contribute
to the restriction, as well as rel is indeed a relationship that contains a list of possible solutions for the parameters
should assume in order to meet the restrictions of something like the issue.

❖ For a constraint satisfaction problem (CSP), the following conditions must be met:
o States area
o fundamental idea while behind remedy.
The definition of a state in phase space involves giving values to any or all of the parameters, like as
X1 = v1, X2 = v2, etc.

20
❖ Domain Categories within CSP:
The parameters utilize one of the two types of domains listed below:
o Discrete Domain: This limitless area allows for the existence of a single state with numerous variables.
For instance, every parameter may receive a endless number of beginning states.
o It is a finite domain with continous phases that really can describe just one area for just one particular
variable. Another name for it is constant area.

❖ Types of Constraints in CSP:


Basically, there are three different categories of limitations in regard towards the parameters:
o Unary restrictions are the easiest kind of restrictions because they only limit the value of one variable.
o Binary resource limits: These restrictions connect two parameters. A value between x1 and x3 can be found
in a variable named x2.
o Global Resource limits: This kind of restriction includes a unrestricted amount of variables.

❖ The main kinds of restrictions are resolved using certain kinds of resolution
methodologies:
o In linear programming, when every parameter carrying an integer value only occurs in linear equation,
linear constraints are frequently utilised.
o Non-linear Constraints: With non-linear programming, when each variable (an integer value) exists in a
non-linear form, several types of restrictions were utilised.

21
UNIT – 2
Mini-Max Algorithm in Artificial Intelligence

o It is a specialized search algorithm that returns optimal sequence of moves for a player in zero sum game.
o Recursive/Backtracking algorithm which is used in decision making and game theory.
o Uses recursion to search through game tree.
o Algorithm computers minimax decision for current state.
o Two Players:
1. Max [selects maximum value]
2. Min [selects minimum value]
o Depth-first search algorithm is used for exploration of complete game tree.
o Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-making and game
theory. It provides an optimal move for the player assuming that opponent is also playing optimally.
o Mini-Max algorithm uses recursion to search through the game-tree.
o Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe, go, and
various tow-players game. This Algorithm computes the minimax decision for the current state.
o In this algorithm two players play the game, one is called MAX and other is called MIN.
o Both the players fight it as the opponent player gets the minimum benefit while they get the maximum
benefit.
o Both Players of the game are opponent of each other, where MAX will select the maximized value and
MIN will select the minimized value.
o The minimax algorithm performs a depth-first search algorithm for the exploration of the complete game
tree.
o The minimax algorithm proceeds all the way down to the terminal node of the tree, then backtrack the tree
as the recursion.

❖ Properties: -
1. Definitely found solution(if exists).
2. Optimal solution.
3. Time complexity = O(b^m). where b is branch factor of game tree and m is depth.
4. Space complexity = O(b^m).

❖ Limitation: -
1. Slow for complex games such as chess.

22
❖ Working of Min-Max Algorithm:
Following are the main steps involved in solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility function to get the
utility values for the terminal states.

Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we will compare each
value in terminal state with initial value of Maximizer and determines the higher nodes values. It will find the
maximum among the all.
o For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) = max(0, 7) = 7

23
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞, and will find the
3rd layer node values.
o For node B= min(4,6) = 4
o For node C= min (-3, 7) = -3

Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes value and find the
maximum value for the root node. In this game tree, there are only 4 layers, hence we reach immediately to the
root node, but in real games, there will be more than 4 layers.
o For node A max(4, -3)= 4

That was the complete workflow of the minimax two player game.

24
Alpha-Beta Pruning

Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization technique for the
minimax algorithm.
There is a technique by which without checking each node of the game tree we can compute the correct
minimax decision, and this technique is called pruning. This involves two threshold parameter Alpha and beta
for future expansion, so it is called alpha-beta pruning. It is also called as Alpha-Beta Algorithm.
The two-parameter can be defined as:
1. Alpha: The best (highest-value) choice we have found so far at any point along the path of Maximizer.
The initial value of alpha is -∞.
2. Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer.
The initial value of beta is +∞.
The main condition which required for alpha-beta pruning is:
1. α>=β

❖ Key points about alpha-beta pruning:

o The Max player will only update the value of alpha.


o The Min player will only update the value of beta.
o While backtracking the tree, the node values will be passed to upper nodes instead of values of alpha and
beta.
o We will only pass the alpha, beta values to the child nodes.

25
❖ Working of Alpha-Beta Pruning:
Let's take an example of two-player search tree to understand the working of Alpha-beta pruning
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and β= +∞, these value of
alpha and beta passed down to node B where again α= -∞ and β= +∞, and Node B passes the same value to its child
D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is compared with firstly 2
and then 3, and the max (2, 3) = 3 will be the value of α at node D and node value will also 3.
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn of Min, Now β= +∞,
will compare with the available subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now α= -∞, and β= 3.

In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α= -∞, and β= 3
will also be passed.

26
Step 4: At node E, Max will take its turn, and the value of alpha will change. The current value of alpha will be
compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right successor of E will be
pruned, and algorithm will not traverse it, and the value at node E will be 5.

Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node A, the value of alpha will
be changed the maximum available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values now passes to right
successor of A which is Node C.

At node C, α=3 and β= +∞, and the same values will be passed on to node F.

Step 6: At node F, again the value of α will be compared with left child which is 0, and max(3,0)= 3, and then
compared with right child which is 1, and max(3,1)= 3 still α remains 3, but the node value of F will become 1.

27
Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta will be changed, it
will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and again it satisfies the condition α>=β, so the next
child of C which is G will be pruned, and the algorithm will not compute the entire sub-tree G.

Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following is the final game
tree which is the showing the nodes which are computed and nodes which has never computed. Hence the optimal
value for the maximizer is 3 for this example.

28
Water Jug problem using BFS

Given two water jugs with capacities X and Y litres. Initially, both the jugs are empty. Also given that there is an
infinite amount of water available. The jugs do not have markings to measure smaller quantities.
One can perform the following operations on the jug:
• Fill any of the jugs completely with water.
• Pour water from one jug to the other until one of the jugs is either empty or full, (X, Y) -> (X – d, Y + d)
• Empty any of the jugs
The task is to determine whether it is possible to measure Z litres of water using both the jugs. And if true, print any
of the possible ways.

Input: X = 4, Y = 3, Z = 2
Output: {(0, 0), (0, 3), (3, 0), (3, 3), (4, 2), (0, 2)}
Explanation:

• Fill the 4 litre jug completely with water.


• Empty water from 4-litre jug into 3-litre (leaving 1L water in 4L jug and 3L completely full).
• Empty water from 3L.
• Pour water from 4L jug into 3L jug (4L being completely empty and 1L water in 3L litre jug)
• Fill the 4L jug with water completely again.
• Transfer water from 4L jug to 3L jug, resulting in 2L water in 4L jug.

Input: X = 3, Y = 5, Z = 4
Output: 6
Explanation:

• Fill 5-litres jug to its maximum capacity.


• Transfer 3-litres from 5-litres jug to 3-litres jugs.
• Empty the 3-litres jug.
• Transfer 2-litres from 5-litres jug to 3-litres jug.
• Fill 5-litres jug to its maximum capacity.
• Pour water to 3L jug from 5L jug until it’s full.

29
❖ Breadth-First Search (BFS) Approach:
The idea is to run a Breadth-First Search(BFS). The BFS approach keeps a track of the states of the total water
in both the jugs at a given time, The key idea is to visit all the possible states and also keep track of the visited
states using a visiting array or a hashmap. In case, the total amount of water in any of the jugs or summation of
the total amount of water in both the jugs is equal to Z, return True and print the resulting states.
Recursion Tree:

30
➢ Algorithm:
• Initialise a queue to implement BFS.
• Since, initially, both the jugs are empty, insert the state {0, 0} into the queue.
• Perform the following state, till the queue becomes empty:
o Pop out the first element of the queue.
o If the value of popped element is equal to Z, return True.
o Let X_left and Y_left be the amount of water left in the jugs respectively.
o Now perform the fill operation:
▪ If the value of X_left < X, insert ({X_left, Y}) into the hashmap, since this state hasn’t
been visited and some water can still be poured in the jug.
▪ If the value of Y_left < Y, insert ({Y_left, X}) into the hashmap, since this state hasn’t
been visited and some water can still be poured in the jug.
o Perform the empty operation:
▪ If the state ({0, Y_left}) isn’t visited, insert it into the hashmap, since we can empty any
of the jugs.
▪ Similarly, if the state ({X_left, 0) isn’t visited, insert it into the hashmap, since we can
empty any of the jugs.
o Perform the transfer of water operation:
▪ min({X-X_left, Y}) can be poured from second jug to first jug. Therefore, in case – {X
+ min({X-X_left, Y}) , Y – min({X-X_left, Y}) isn’t visited, put it into hashmap.
▪ min({X_left, Y-Y_left}) can be poured from first jug to second jug. Therefore, in case
– {X_left – min({X_left, Y – X_left}) , Y + min({X_left, Y – Y_left}) isn’t visited, put it
into hashmap.
• Return False, since, it is not possible to measure Z litres.

31
❖ Mathematical Approach:
Since we need to find if Z can be measured from the given jugs of X and Y litres. This can be written in a single
equation as follows:
A*X+B*Y=Z
where A and B are integers. Now, this is a linear diophantine equation and is easily solvable iff. GCD(X,
Y) divides Z. So, the conditions to solve this problem is:
• Z % GCD(X, Y) = 0
• X+Y>Z

32
Chess Problem in Artificial Intelligence

It is a normal chess game. In a chess problem, the start is the initial configuration of chessboard. The final state is
the any board configuration, which is a winning position for any player. There may be multiple final positions and
each board configuration can be thought of as representing a state of the game. Whenever any player moves any
piece, it leads to different state of game.

❖ Procedure:

Using a predicate called move in predicate calculus, whose parameters are the starting and ending squares, we have
described the legal moves on the board.
For example, move (1, 8) takes the knight from the upper left-hand corner to the middle of the bottom row. While
playing Chess, a knight can move two squares either horizontally or vertically followed by one square in an
orthogonal direction as long as it does not move off the board.
The all possible moves of figure are as follows.
Move (1, 8) move (6, 1)
Move (1, 6) move (6, 7)
Move (2, 9) move (7, 2)
Move (2, 7) move (7, 6)
Move (3, 4) move (8, 3)
Move (3, 8) move (8, 1)
Move (4, 1) move (9, 2)
Move (4, 3) move (9, 4)
The above predicates of the Chess Problem form the knowledge base for this problem. An unification algorithm is
used to access the knowledge base.
Suppose we need to find the positions to which the knight can move from a particular location, square 2.
The goal move (z, x) unifies with two different predicates in the knowledge base, with the substitutions {7/x} and
{9/x}. Given the goal move (2, 3), the responsible is failure, because no move (2, 3) exists in the knowledge base.

33
Tiles problem in AI

❖ State space search: -


It is a process used in AI in which successive configurations or states of an instance are considered with
intention of finding a goal state with desired property.
Problems are modelled as state space (Set of states in which a problem can be).
Representation:
S : (S, A, Action(s), Result(s,a), cost(s,a)).

This problem is also known as “Black and White Sliding Tiles Problem “.

There are 7 tiles, 3 are white and 3 are black and there is a white space in the middle. All the black tiles are

placed left and all the white tiles are placed right. Each of the white or black tiles can move to the blank space

and they can hope over one or two tiles to reach the blank space.

1. Analyze the state-space of the problem.

2. Propose a heuristic for this problem to move all the white tiles left to all the while tiles, the

position of the blank space is not important.

Initial State:

TILES: BBB WWW

Goal State:

TILES: WWW BBB

34
❖ Assumptions:

• Heuristics for Black Tile = tile-distance to the first white tile to its right

• Heuristics for White Tile = tile-distance to the first black tile to it's left * -1

• We will run the script as long as we will have a non zero value for a tile.

• If any black tile does not contain any white tile to its left then that tile will have Zero Heuristics

• If any white tile does not contain any black tile to its right then that tile will have Zero Heuristics

• We will have a selector value initially set as White.

• For each iteration, we will change the value to black and white and so no.

• Depending on the selector we will choose that colored tile with the highest tile says, Tile X.

• X will be moved to the blank space.

• If the tile jumps to its next tile the cost:= 1 if hop one tile then cost:= 2 if jumps 2 tiles then cost:= 3.

35
❖ Algorithm:

1. Selector:= white

2. Foreach black tile b

3. do h(b):= tile-distance to the first white tile to its right

4. Foreach white tile w

5. do h(w):= tile-distance to the first black tile to its left * -1

6. While any tile has a non-zero h value

7. Do

8. If selector:= white then,

9. Select the while-tile with the highest h value, say X

10. If the h(X):= 0 and it does not have the blank space to its left then,

11. Select another while-tile with the next-highest h value, say X

12. Selector := black

13. Else

14. Select the black-tile with the highest h value, say X

15. If the h(X):= 0 and it does not have the blank space to its left then,

16. Select another black-tile with the next-highest h value, say X

17. Selector:= white

18. If X and the blank space is in 2-tiles distance then,

19. Move X to the blank space and record the cost

20. Foreach black tile b

21. do h(b):= tile-distance to the first white tile to its right

22. Foreach white tile w

23. do h(w):= tile-distance to the first black tile to it's left * -1

24. End while

36
UNIT – 3
Propositional logic in Artificial intelligence
Propositional logic (PL) is the simplest form of logic where all the statements are made by propositions. A
proposition is a declarative statement which is either true or false. It is a technique of knowledge representation in
logical and mathematical form.
Example:
1. a) It is Sunday.
2. b) The Sun rises from West (False proposition)
3. c) 3+3= 7(False proposition)
4. d) 5 is a prime number.

Following are some basic facts about propositional logic:


o Propositional logic is also called Boolean logic as it works on 0 and 1.
o In propositional logic, we use symbolic variables to represent the logic, and we can use any symbol for a
representing a proposition, such A, B, C, P, Q, R, etc.
o Propositions can be either true or false, but it cannot be both.
o Propositional logic consists of an object, relations or function, and logical connectives.
o These connectives are also called logical operators.
o The propositions and connectives are the basic elements of the propositional logic.
o Connectives can be said as a logical operator which connects two sentences.
o A proposition formula which is always true is called tautology, and it is also called a valid sentence.
o A proposition formula which is always false is called Contradiction.
o A proposition formula which has both true and false values is called
o Statements which are questions, commands, or opinions are not propositions such as "Where is Rohini",
"How are you", "What is your name", are not propositions.

❖ Syntax of propositional logic:


The syntax of propositional logic defines the allowable sentences for the knowledge representation. There are
two types of Propositions:
1. Atomic Propositions
2. Compound propositions
o Atomic Proposition: Atomic propositions are the simple propositions. It consists of a single proposition
symbol. These are the sentences which must be either true or false.
Example:
1. a) 2+2 is 4, it is an atomic proposition as it is a true fact.
2. b) "The Sun is cold" is also a proposition as it is a false fact.
o Compound proposition: Compound propositions are constructed by combining simpler or atomic propositions,
using parenthesis and logical connectives.
Example:
1. a) "It is raining today, and street is wet."
2. b) "Ankit is a doctor, and his clinic is in Mumbai."

37
❖ Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a sentence logically. We can
create compound propositions with the help of logical connectives. There are mainly five connectives, which
are given as follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal or
negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.
Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P and Q
are the propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also known as if-
then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am breathing,
then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.

38
❖ Following is the summarized table for Propositional Logic Connectives:

❖ Truth Table:

In propositional logic, we need to know the truth values of propositions in all possible scenarios.
We can combine all the possible combination with logical connectives, and the representation of
these combinations in a tabular format is called Truth table. Following are the truth table for all
logical connectives:

39
❖ Truth table with three propositions:
We can build a proposition composing three propositions P, Q, and R. This truth table is made-up of 8n Tuples
as we have taken three proposition symbols.

❖ Precedence of connectives:

Just like arithmetic operators, there is a precedence order for propositional connectors or logical operators. This
order should be followed while evaluating a propositional problem. Following is the list of the precedence order
for operators:

Precedence Operators

First Precedence Parenthesis

Second Precedence Negation

Third Precedence Conjunction(AND)

Fourth Precedence Disjunction(OR)

Fifth Precedence Implication

Six Precedence Biconditional

40
❖ Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions are said to be logically
equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In below truth table
we can see that column for ¬A∨ B and A→B, are identical hence A is Equivalent to B

❖ Properties of Operators:

o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.

❖ Limitations of Propositional logic:

o We cannot represent relations like ALL, some, or none with propositional logic. Example:
1. All the girls are intelligent.
2. Some apples are sweet.
o Propositional logic has limited expressive power.
o In propositional logic, we cannot describe statements in terms of their properties or logical relationships.

41
First-Order Logic in Artificial intelligence

In the topic of Propositional logic, we have seen that how to represent statements using
propositional logic. But unfortunately, in propositional logic, we can only represent the facts,
which are either true or false. PL is not sufficient to represent the complex sentences or natural
language statements. The propositional logic has very limited expressive power. Consider the
following sentence, which we cannot represent using PL logic.
o "Some humans are intelligent", or
o "Sachin likes cricket."

To represent the above statements, PL logic is not sufficient, so we required some more powerful logic, such as first-
order logic.

❖ First-Order logic:

o First-order logic is another way of knowledge representation in artificial intelligence. It is an extension to


propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order logic is a
powerful language that develops information about the objects in a more easy way and can also express the
relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains facts like
propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation such
as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
o Syntax
o Semantics

42
❖ Syntax of First-Order logic:
The syntax of FOL determines which collection of symbols is a logical expression in first-order logic. The basic
syntactic elements of first-order logic are symbols. We write statements in short-hand notation in FOL.
Basic Elements of First-order logic:
Following are the basic elements of FOL syntax:
Constant 1, 2, A, John, Mumbai, cat,....

Variables x, y, z, a, b,....

Predicates Brother, Father, >,....

Function sqrt, LeftLegOf, ....

Connectives ∧, ∨, ¬, ⇒, ⇔

Equality ==

Quantifier ∀, ∃

❖ Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed from a
predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).
Chinky is a cat: => cat (Chinky).

❖ Complex Sentences:

o Complex sentences are made by combining atomic sentences using connectives.


First-order logic statements can be divided into two parts:
o Subject: Subject is the main part of the statement.
o Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement.
Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the
statement and second part "is an integer," is known as a predicate.

43
❖ Quantifiers in First-order logic:
o A quantifier is a language element which generates quantification, and quantification specifies the quantity
of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the variable in the logical
expression. There are two types of quantifier:
1. Universal Quantifier, (for all, everyone, everything)
2. Existential quantifier, (for some, at least one).

❖ Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement within its range is
true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.
If x is a variable, then ∀x is read as:
o For all x
o For each x
o For every x.

❖ Example:
All man drink coffee.
Let a variable x which refers to a cat so all x can be represented in UOD as below:

∀x man(x) → drink (x, coffee).


It will be read as: There are all x where x is a man who drink coffee.

44
❖ Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its scope is true for at
least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate variable
then it is called as an existential quantifier.
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
o There exists a 'x.'
o For some 'x.'
o For at least one 'x.'

Example:

Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)


It will be read as: There are some x where x is a boy who is intelligent.

❖ Points to remember:

o The main connective for universal quantifier ∀ is implication →.


o The main connective for existential quantifier ∃ is and ∧.

45
❖ Properties of Quantifiers:

o In universal quantifier, ∀x∀y is similar to ∀y∀x.


o In Existential quantifier, ∃x∃y is similar to ∃y∃x.
o ∃x∀y is not similar to ∀y∃x.
Some Examples of FOL using quantifier:
1. All birds fly.
In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
2. Every man respects his parent.
In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).
3. Some boys play cricket.
In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys so we
will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).
4. Not all students like both Mathematics and Science.
In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student failed in Mathematics.
In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following representation for this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x,
Mathematics)].

❖ Free and Bound Variables:


The quantifiers interact with variables which appear in a suitable way. There are two types of variables in First-
order logic which are given below:
Free Variable: A variable is said to be a free variable in a formula if it occurs outside the scope of the
quantifier.
Example: ∀x ∃(y)[P (x, y, z)], where z is a free variable.
Bound Variable: A variable is said to be a bound variable in a formula if it occurs within the scope of the
quantifier.
Example: ∀x [A (x) B( y)], here x and y are the bound variables.

46
Resolution in FOL

❖ The resolution inference rule:


The resolution rule for first-order logic is simply a lifted version of the propositional rule. Resolution can
resolve two clauses if they contain complementary literals, which are assumed to be standardized apart so that
they share no variables.

Where li and mj are complementary literals.


This rule is also called the binary resolution rule because it only resolves exactly two literals.
Example:
We can resolve two clauses which are given below:
[Animal (g(x) V Loves (f(x), x)] and [¬ Loves(a, b) V ¬Kills(a, b)]
Where two complimentary literals are: Loves (f(x), x) and ¬ Loves (a, b)
These literals can be unified with unifier θ= [a/f(x), and b/x] , and it will generate a resolvent clause:
[Animal (g(x) V ¬ Kills(f(x), x)].

❖ Steps for Resolution:


1. Conversion of facts into first-order logic.
2. Convert FOL statements into CNF
3. Negate the statement which needs to prove (proof by contradiction)
4. Draw resolution graph (unification).
To better understand all the above steps, we will take an example in which we will apply resolution.

Example:
a) John likes all kind of food.
b) Apple and vegetable are food
c) Anything anyone eats and not killed is food.
d) Anil eats peanuts and still alive
e) Harry eats everything that Anil eats.
Prove by resolution that:
f) John likes peanuts.

47
Step-1: Conversion of Facts into FOL
In the first step we will convert all the given statements into its first order logic.

Step-2: Conversion of FOL into CNF


In First order logic resolution, it is required to convert the FOL into CNF as CNF form makes easier for resolution
proofs.
o Eliminate all implication (→) and rewrite
1. ∀x ¬ food(x) V likes(John, x)
2. food(Apple) Λ food(vegetables)
3. ∀x ∀y ¬ [eats(x, y) Λ ¬ killed(x)] V food(y)
4. eats (Anil, Peanuts) Λ alive(Anil)
5. ∀x ¬ eats(Anil, x) V eats(Harry, x)
6. ∀x¬ [¬ killed(x) ] V alive(x)
7. ∀x ¬ alive(x) V ¬ killed(x)
8. likes(John, Peanuts).
o Move negation (¬)inwards and rewrite
1. ∀x ¬ food(x) V likes(John, x)
2. food(Apple) Λ food(vegetables)
3. ∀x ∀y ¬ eats(x, y) V killed(x) V food(y)
4. eats (Anil, Peanuts) Λ alive(Anil)
5. ∀x ¬ eats(Anil, x) V eats(Harry, x)
6. ∀x ¬killed(x) ] V alive(x)
7. ∀x ¬ alive(x) V ¬ killed(x)
8. likes(John, Peanuts).
o Rename variables or standardize variables
1. ∀x ¬ food(x) V likes(John, x)
2. food(Apple) Λ food(vegetables)
3. ∀y ∀z ¬ eats(y, z) V killed(y) V food(z)

48
4. eats (Anil, Peanuts) Λ alive(Anil)
5. ∀w¬ eats(Anil, w) V eats(Harry, w)
6. ∀g ¬killed(g) ] V alive(g)
7. ∀k ¬ alive(k) V ¬ killed(k)
8. likes(John, Peanuts).
o Eliminate existential instantiation quantifier by elimination.
In this step, we will eliminate existential quantifier ∃, and this process is known as Skolemization. But in
this example problem since there is no existential quantifier so all the statements will remain same in this
step.
o Drop Universal quantifiers.
In this step we will drop all universal quantifier since all the statements are not implicitly quantified so we
don't need it.
1. ¬ food(x) V likes(John, x)
2. food(Apple)
3. food(vegetables)
4. ¬ eats(y, z) V killed(y) V food(z)
5. eats (Anil, Peanuts)
6. alive(Anil)
7. ¬ eats(Anil, w) V eats(Harry, w)
8. killed(g) V alive(g)
9. ¬ alive(k) V ¬ killed(k)
10. likes(John, Peanuts).

o Distribute conjunction ∧ over disjunction ¬.


This step will not make any change in this problem.
Step-3: Negate the statement to be proved
In this statement, we will apply negation to the conclusion statements, which will be written as ¬likes(John, Peanuts)
Step-4: Draw Resolution graph:
Now in this step, we will solve the problem by resolution tree using substitution. For the above problem, it will be
given as follows:

49
Hence the negation of the conclusion has been proved as a complete contradiction with the given set of statements.

Explanation of Resolution graph:

o In the first step of resolution graph, ¬likes(John, Peanuts) , and likes(John, x) get resolved(canceled) by
substitution of {Peanuts/x}, and we are left with ¬ food(Peanuts)

o In the second step of the resolution graph, ¬ food(Peanuts) , and food(z) get resolved (canceled) by
substitution of { Peanuts/z}, and we are left with ¬ eats(y, Peanuts) V killed(y) .

o In the third step of the resolution graph, ¬ eats(y, Peanuts) and eats (Anil, Peanuts) get resolved by
substitution {Anil/y}, and we are left with Killed(Anil) .

o In the fourth step of the resolution graph, Killed(Anil) and ¬ killed(k) get resolve by substitution {Anil/k},
and we are left with ¬ alive(Anil) .

o In the last step of the resolution graph ¬ alive(Anil) and alive(Anil) get resolved.

50
What is the Role of Planning in Artificial
Intelligence?

Artificial intelligence is an important technology in the future. Whether it is intelligent robots, self-driving cars, or
smart cities, they will all use different aspects of artificial intelligence!!! But Planning is very important to make any
such AI project.
Even Planning is an important part of Artificial Intelligence which deals with the tasks and domains of a particular
problem. Planning is considered the logical side of acting.
Everything we humans do is with a definite goal in mind, and all our actions are oriented towards achieving our
goal. Similarly, Planning is also done for Artificial Intelligence.
For example, Planning is required to reach a particular destination. It is necessary to find the best route in Planning,
but the tasks to be done at a particular time and why they are done are also very important.
That is why Planning is considered the logical side of acting. In other words, Planning is about deciding the tasks to
be performed by the artificial intelligence system and the system's functioning under domain-independent
conditions.

❖ What is a Plan?
We require domain description, task specification, and goal description for any planning system. A plan is
considered a sequence of actions, and each action has its preconditions that must be satisfied before it can act
and some effects that can be positive or negative.
So, we have Forward State Space Planning (FSSP) and Backward State Space Planning (BSSP) at the
basic level.
1. Forward State Space Planning (FSSP):
FSSP behaves in the same way as forwarding state-space search. It says that given an initial state S in any
domain, we perform some necessary actions and obtain a new state S' (which also contains some new terms),
called a progression. It continues until we reach the target position. Action should be taken in this matter.
o Disadvantage: Large branching factor
o Advantage: The algorithm is Sound
2. Backward State Space Planning (BSSP):
BSSP behaves similarly to backward state-space search. In this, we move from the target state g to the sub-goal
g, tracing the previous action to achieve that goal. This process is called regression (going back to the previous
goal or sub-goal). These sub-goals should also be checked for consistency. The action should be relevant in this
case.
o Disadvantages: not sound algorithm (sometimes inconsistency can be found)
o Advantage: Small branching factor (much smaller than FSSP)
So for an efficient planning system, we need to combine the features of FSSP and BSSP, which gives rise to
target stack planning which will be discussed in the next article.

51
❖ What is planning in AI?
Planning in artificial intelligence is about decision-making actions performed by robots or computer programs
to achieve a specific goal.
Execution of the plan is about choosing a sequence of tasks with a high probability of accomplishing a specific
task.

➢ Block-world planning problem:


o The block-world problem is known as the Sussmann anomaly.
o The non-interlaced planners of the early 1970s were unable to solve this problem. Therefore it is considered
odd.
o When two sub-goals, G1 and G2, are given, a non-interleaved planner either produces a plan for G1 that is
combined with a plan for G2 or vice versa.
o In the block-world problem, three blocks labeled 'A', 'B', and 'C' are allowed to rest on a flat surface. The
given condition is that only one block can be moved at a time to achieve the target.

The start position and target position are shown in the following diagram:

❖ Components of the planning system:

The plan includes the following important steps:


o Choose the best rule to apply the next rule based on the best available guess.
o Apply the chosen rule to calculate the new problem condition.
o Find out when a solution has been found.
o Detect dead ends so they can be discarded and direct system effort in more useful directions.
o Find out when a near-perfect solution is found.

52
❖ Target stack plan:
o It is one of the most important planning algorithms used by STRIPS.
o Stacks are used in algorithms to capture the action and complete the target. A knowledge base is used to
hold the current situation and actions.
o A target stack is similar to a node in a search tree, where branches are created with a choice of action.

The important steps of the algorithm are mentioned below:


1. Start by pushing the original target onto the stack. Repeat this until the pile is empty. If the stack top is a
mixed target, push its unsatisfied sub-targets onto the stack.
2. If the stack top is a single unsatisfied target, replace it with action and push the action precondition to the
stack to satisfy the condition.
3. If the stack top is an action, pop it off the stack, execute it and replace the knowledge base with the action's
effect.

❖ Non-linear Planning:
This Planning is used to set a goal stack and is included in the search space of all possible sub-goal orderings. It
handles the goal interactions by the interleaving method.

➢ Advantages of non-Linear Planning:


Non-linear Planning may be an optimal solution concerning planning length (depending on the search
strategy used).

➢ Disadvantages of Nonlinear Planning:


It takes a larger search space since all possible goal orderings are considered.
Complex algorithm to understand.

❖ Algorithm:
1. Choose a goal 'g' from the goal set
2. If 'g' does not match the state, then
1. Choose an operator 'o' whose add-list matches goal g
2. Push 'o' on the OpStack
3. Add the preconditions of 'o' to the goal set
3. While all preconditions of the operator on top of OpenStack are met in a state
1. Pop operator o from top of opstack
2. state = apply(o, state)
3. plan = [plan; o]

53
Partial Order Planning

It works on problem decomposition. It will divide the problem into parts and achieve these sub goals independently. It
solves the sub problems with sub plans and then combines these sub plans and reorders them based on requirements.
Flexibility in ordering the subplans. In POP, ordering of the actions is partial. It does not specify which action will
come first out of the two actions which are placed in the plan.
Let’s look at this with the help of an example. The problem of wearing shoes can be performed through total order or
partial order planning.

Init: Barefoot
Goal: RightShoeOn ^ LeftShoeOn
Action: 1. RightShoeOn
Precondition: RightSockOn
Effect: RightShoeOn
2. LeftShoeOn
Precondition: LeftSockOn
Effect: LeftShoeOn
3. LeftSockOn
Precondition: Barefoot
Effect: LeftSockOn
4. RightSockOn
Precondition: Barefoot
Effect: RightSockOn

The TOP consists of six sequences, one of which can be taken in order to reach the finish state. However, the POP is
less complex. It combines two action sequences. The first branch covers the left sock and left shoe. To wear left shoe,
wearing the left sock is a precondition. Similarly the second branch covers the right sock and right show. Once these
actions are taken, we achieve our goal and reach the finish state.

54
❖ Defining a Partial Order Plan:
• A set of actions, that make up the steps of the plan. For instance, {RightShoe, RightSock, LeftShoe,
LeftSock, Start, Finish}.
• A set of ordering constraints, A before B. For instance, {RightSock < RightShoe, LeftSock < LeftShoe}.

• A set of causal links, A achieves P for B.

• A set of open preconditions. A precondition is open if it is not achieved by some action in the plan.

55
56
Probabilistic reasoning in Artificial intelligence

❖ Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional logic with
certainty, which means we were sure about the predicates. With this knowledge representation, we might write
A→B, which means if A is true then B is true, but consider a situation where we are not sure about whether A is
true or not then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain reasoning
or probabilistic reasoning.

❖ Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

❖ Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability to
indicate the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with logic to
handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is the
result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as "It will
rain today," "behavior of someone for some situations," "A match between two teams or two players." These are
probable sentences for which we can assume that it will happen but not sure about it, so here we use
probabilistic reasoning.

❖ Need of probabilistic reasoning in AI:


o When there are unpredictable outcomes.
o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics

57
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic reasoning, let's
understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the numerical measure of
the likelihood that an event will occur. The value of probability always remains between 0 and 1 that represent ideal
uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into account. It
is a combination of prior probability and new information.

❖ Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A under
the conditions of B", it can be written as:

Where P(A⋀B)= Joint probability of a and B


P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given as:

It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be
reduced to set B, and now we can only calculate event A when event B is already occurred by dividing the
probability of P(A⋀B) by P( B ).

Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.

58
Bayesian Belief Network in artificial intelligence
Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which represents a set of variables and their conditional
dependencies using a directed acyclic graph."
It defines probabilistic independencies and dependencies among the variable in the N/W.
It is a probabilistic graphical model which represents a set of variable and their conditional dependencies using a
directed acyclic graph.
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple events, we
need a Bayesian network. It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision making under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge
is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:

o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between random variables.
These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link that means
that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes of the
network graph.
o If we are considering node B, which is connected with node A by a directed arrow, then node A is
called the parent of Node B.
o Node C is independent of node A.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which determines the
effect of the parent on that node.

59
❖ Bayesian network is based on Joint probability distribution and conditional probability.
So, let's first understand the joint probability distribution:
➢ Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3..
xn, are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))

❖ Explanation of Bayesian network:


Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and Sophia, who
have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry when
he hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On the
other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would like
to compute the probability of Burglary Alarm.

❖ List of all events occurring in this network:


o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the above
probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

60
Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.

❖ We can provide the conditional probabilities as per the below tables:


Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:

B E P(A= True) P(A= False)

True True 0.94 0.06

True False 0.95 0.04

False True 0.31 0.69

False False 0.001 0.999


Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of Alarm.

A P(D= True) P(D= False)

True 0.91 0.09

False 0.05 0.95


Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."

A P(S= True) P(S= False)

True 0.75 0.25

False 0.02 0.98


From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence statements.
It is helpful in designing inference procedure.

61
UNIT – 4
Different types of learning

Learning is the process of converting experience into expertise or knowledge.


Learning can be broadly classified into three categories, as mentioned below, based on the nature of the learning
data and interaction between the learner and the environment.
• Supervised Learning
• Unsupervised Learning
• Semi-supervised Learning
Similarly, there are four categories of machine learning algorithms as shown below −
• Supervised learning algorithm
• Unsupervised learning algorithm
• Semi-supervised learning algorithm
• Reinforcement learning algorithm
However, the most commonly used ones are supervised and unsupervised learning.

❖ Supervised Learning:
Supervised learning is commonly used in real world applications, such as face and speech recognition, products
or movie recommendations, and sales forecasting. Supervised learning can be further classified into two types
- Regression and Classification.
Regression trains on and predicts a continuous-valued response, for example predicting real estate prices.
Classification attempts to find the appropriate class label, such as analyzing positive/negative sentiment, male
and female persons, benign and malignant tumors, secure and unsecure loans etc.
In supervised learning, learning data comes with description, labels, targets or desired outputs and the objective
is to find a general rule that maps inputs to outputs. This kind of learning data is called labeled data. The
learned rule is then used to label new data with unknown outputs.
Supervised learning involves building a machine learning model that is based on labeled samples. For example,
if we build a system to estimate the price of a plot of land or a house based on various features, such as size,
location, and so on, we first need to create a database and label it. We need to teach the algorithm what features
correspond to what prices. Based on this data, the algorithm will learn how to calculate the price of real estate
using the values of the input features.
Supervised learning deals with learning a function from available training data. Here, a learning algorithm
analyzes the training data and produces a derived function that can be used for mapping new examples. There
are many supervised learning algorithms such as Logistic Regression, Neural networks, Support Vector
Machines (SVMs), and Naive Bayes classifiers.
Common examples of supervised learning include classifying e-mails into spam and not-spam categories,
labeling webpages based on their content, and voice recognition.

62
❖ Unsupervised Learning:
Unsupervised learning is used to detect anomalies, outliers, such as fraud or defective equipment, or to group
customers with similar behaviors for a sales campaign. It is the opposite of supervised learning. There is no
labeled data here.
When learning data contains only some indications without any description or labels, it is up to the coder or to
the algorithm to find the structure of the underlying data, to discover hidden patterns, or to determine how to
describe the data. This kind of learning data is called unlabeled data.
Suppose that we have a number of data points, and we want to classify them into several groups. We may not
exactly know what the criteria of classification would be. So, an unsupervised learning algorithm tries to
classify the given dataset into a certain number of groups in an optimum way.
Unsupervised learning algorithms are extremely powerful tools for analyzing data and for identifying patterns
and trends. They are most commonly used for clustering similar input into logical groups. Unsupervised
learning algorithms include Kmeans, Random Forests, Hierarchical clustering and so on.

❖ Semi-supervised Learning:
If some learning samples are labeled, but some other are not labeled, then it is semi-supervised learning. It
makes use of a large amount of unlabeled data for training and a small amount of labeled data for testing.
Semi-supervised learning is applied in cases where it is expensive to acquire a fully labeled dataset while more
practical to label a small subset. For example, it often requires skilled experts to label certain remote sensing
images, and lots of field experiments to locate oil at a particular location, while acquiring unlabeled data is
relatively easy.

❖ Reinforcement Learning:
Here learning data gives feedback so that the system adjusts to dynamic conditions in order to achieve a certain
objective. The system evaluates its performance based on the feedback responses and reacts accordingly. The
best known instances include self-driving cars and chess master algorithm AlphaGo.

63
Decision Tree Classification Algorithm

Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems,
but mostly it is preferred for solving Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and each leaf node represents the
outcome.
In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision nodes are used to
make any decision and have multiple branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
The decisions or the test are performed on the basis of features of the given dataset.
It is a graphical representation for getting all the possible solutions to a problem/decision based on given conditions.
It is called a decision tree because, similar to a tree, it starts with the root node, which expands on further branches
and constructs a tree-like structure.
In order to build a tree, we use the CART algorithm, which stands for Classification and Regression Tree
algorithm.
A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree into subtrees.

❖ Below diagram explains the general structure of a decision tree:

64
❖ Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a machine learning model. Below are the two reasons for
using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like structure.

❖ Decision Tree Terminologies:


Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which further
gets divided into two or more homogeneous sets.
Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after getting a
leaf node.
Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes according to the
given conditions.
Branch/Sub Tree: A tree formed by splitting the tree.
Pruning: Pruning is the process of removing the unwanted branches from the tree.
Parent/Child node: The root node of the tree is called the parent node, and other nodes are called the child
nodes.

❖ How does the Decision Tree algorithm Work?


In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of the
tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and, based on
the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move further.
It continues the process until it reaches the leaf node of the tree. The complete process can be better understood
using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and called
the final node as a leaf node.

65
❖ Attribute Selection Measures:
While implementing a Decision tree, the main issue arises that how to select the best attribute for the root node
and for sub-nodes. So, to solve such problems there is a technique which is called as Attribute selection
measure or ASM. By this measurement, we can easily select the best attribute for the nodes of the tree. There
are two popular techniques for ASM, which are:
o Information Gain
o Gini Index

1. Information Gain:
o Information gain is the measurement of changes in entropy after the segmentation of a dataset based on an
attribute.
o It calculates how much information a feature provides us about a class.
o According to the value of information gain, we split the node and build the decision tree.
o A decision tree algorithm always tries to maximize the value of information gain, and a node/attribute
having the highest information gain is split first. It can be calculated using the below formula:
Information Gain= Entropy(S)- [(Weighted Avg) *Entropy(each feature)
Entropy: Entropy is a metric to measure the impurity in a given attribute. It specifies randomness in data.
Entropy can be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:

o Gini index is a measure of impurity or purity used while creating a decision tree in the
CART(Classification and Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred as compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses the Gini index to create binary splits.
o Gini index can be calculated using the below formula:
Gini Index= 1- ∑jPj2

❖ Pruning: Getting an Optimal Decision tree:


Pruning is a process of deleting the unnecessary nodes from a tree in order to get the optimal decision
tree.
A too-large tree increases the risk of overfitting, and a small tree may not capture all the important features of
the dataset. Therefore, a technique that decreases the size of the learning tree without reducing accuracy is
known as Pruning. There are mainly two types of tree pruning technology used:
o Cost Complexity Pruning
o Reduced Error Pruning.

66
❖ Advantages of the Decision Tree:

1. It is simple to understand as it follows the same process which a human follow while making any decision
in real-life.
2. It can be very useful for solving decision-related problems.
3. It helps to think about all the possible outcomes for a problem.
4. There is less requirement of data cleaning compared to other algorithms.

❖ Disadvantages of the Decision Tree:

1. The decision tree contains lots of layers, which makes it complex.


2. It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
3. For more class labels, the computational complexity of the decision tree may increase.

67
Support Vector Machine Algorithm

Support Vector Machine or SVM is one of the most popular Supervised Learning algorithms, which is used for
Classification as well as Regression problems. However, primarily, it is used for Classification problems in Machine
Learning.
The goal of the SVM algorithm is to create the best line or decision boundary that can segregate n-dimensional
space into classes so that we can easily put the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases are called as
support vectors, and hence algorithm is termed as Support Vector Machine. Consider the below diagram in which
there are two different categories that are classified using a decision boundary or hyperplane:

Example: SVM can be understood with the example that we have used in the KNN classifier. Suppose we see a
strange cat that also has some features of dogs, so if we want a model that can accurately identify whether it is a cat
or dog, so such a model can be created by using the SVM algorithm. We will first train our model with lots of
images of cats and dogs so that it can learn about different features of cats and dogs, and then we test it with this
strange creature. So as support vector creates a decision boundary between these two data (cat and dog) and choose
extreme cases (support vectors), it will see the extreme case of cat and dog. On the basis of the support vectors, it
will classify it as a cat. Consider the below diagram:

SVM algorithm can be used for Face detection, image classification, text categorization, etc.

68
❖ Types of SVM:
SVM can be of two types:
o Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be classified
into two classes by using a single straight line, then such data is termed as linearly separable data, and
classifier is used called as Linear SVM classifier.
o Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which means if a dataset
cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used
is called as Non-linear SVM classifier.

❖ Hyperplane and Support Vectors in the SVM algorithm:


Hyperplane: There can be multiple lines/decision boundaries to segregate the classes in n-dimensional space,
but we need to find out the best decision boundary that helps to classify the data points. This best boundary is
known as the hyperplane of SVM.
The dimensions of the hyperplane depend on the features present in the dataset, which means if there are 2
features (as shown in image), then hyperplane will be a straight line. And if there are 3 features, then hyperplane
will be a 2-dimension plane.
We always create a hyperplane that has a maximum margin, which means the maximum distance between the
data points.

❖ Support Vectors:
The data points or vectors that are the closest to the hyperplane and which affect the position of the hyperplane are
termed as Support Vector. Since these vectors support the hyperplane, hence called a Support vector.

69
❖ How does SVM works?

Linear SVM:

The working of the SVM algorithm can be understood by using an example. Suppose we have a dataset that has
two tags (green and blue), and the dataset has two features x1 and x2. We want a classifier that can classify the
pair(x1, x2) of coordinates in either green or blue. Consider the below image:

So as it is 2-d space so by just using a straight line, we can easily separate these two classes. But there can
be multiple lines that can separate these classes. Consider the below image:

Hence, the SVM algorithm helps to find the best line or decision boundary; this best boundary or region is
called as a hyperplane. SVM algorithm finds the closest point of the lines from both the classes. These
points are called support vectors. The distance between the vectors and the hyperplane is called as margin.
And the goal of SVM is to maximize this margin. The hyperplane with maximum margin is called
the optimal hyperplane.

70
Non-Linear SVM:

If data is linearly arranged, then we can separate it by using a straight line, but for non-linear data, we
cannot draw a single straight line. Consider the below image:

So to separate these data points, we need to add one more dimension. For linear data, we have used two
dimensions x and y, so for non-linear data, we will add a third dimension z. It can be calculated as:

z=x2 +y2

By adding the third dimension, the sample space will become as below image:

71
So now, SVM will divide the datasets into classes in the following way. Consider the below image:

Since we are in 3-d Space, hence it is looking like a plane parallel to the x-axis. If we convert it in 2d space with
z=1, then it will become as:

Hence we get a circumference of radius 1 in case of non-linear data.

72
Market Basket Analysis in Data Mining

Market basket analysis is a data mining technique used by retailers to increase sales by better understanding
customer purchasing patterns. It involves analyzing large data sets, such as purchase history, to reveal product
groupings and products that are likely to be purchased together.
The adoption of market basket analysis was aided by the advent of electronic point-of-sale (POS) systems.
Compared to handwritten records kept by store owners, the digital records generated by POS systems made it easier
for applications to process and analyze large volumes of purchase data.
Implementation of market basket analysis requires a background in statistics and data science and some algorithmic
computer programming skills. For those without the needed technical skills, commercial, off-the-shelf tools exist.
One example is the Shopping Basket Analysis tool in Microsoft Excel, which analyzes transaction data contained in
a spreadsheet and performs market basket analysis. A transaction ID must relate to the items to be analyzed. The
Shopping Basket Analysis tool then creates two worksheets:
o The Shopping Basket Item Groups worksheet, which lists items that are frequently purchased together,
o And the Shopping Basket Rules worksheet shows how items are related (For example, purchasers of
Product A are likely to buy Product B).

❖ How does Market Basket Analysis Work?


Market Basket Analysis is modelled on Association rule mining, i.e., the IF {}, THEN {} construct. For
example, IF a customer buys bread, THEN he is likely to buy butter as well.
Association rules are usually represented as: {Bread} -> {Butter}
Some terminologies to familiarize yourself with Market Basket Analysis are:
o Antecedent:Items or 'itemsets' found within the data are antecedents. In simpler words, it's the IF
component, written on the left-hand side. In the above example, bread is the antecedent.
o Consequent:A consequent is an item or set of items found in combination with the antecedent. It's the
THEN component, written on the right-hand side. In the above example, butter is the consequent.

73
❖ Types of Market Basket Analysis:
Market Basket Analysis techniques can be categorized based on how the available data is utilized. Here are the
following types of market basket analysis in data mining, such as:
1. Descriptive market basket analysis: This type only derives insights from past data and is the most
frequently used approach. The analysis here does not make any predictions but rates the association
between products using statistical techniques. For those familiar with the basics of Data Analysis, this
type of modelling is known as unsupervised learning.
2. Predictive market basket analysis: This type uses supervised learning models like classification and
regression. It essentially aims to mimic the market to analyze what causes what to happen. Essentially,
it considers items purchased in a sequence to determine cross-selling. For example, buying an extended
warranty is more likely to follow the purchase of an iPhone. While it isn't as widely used as a
descriptive MBA, it is still a very valuable tool for marketers.
3. Differential market basket analysis: This type of analysis is beneficial for competitor analysis. It
compares purchase history between stores, between seasons, between two time periods, between
different days of the week, etc., to find interesting patterns in consumer behaviour. For example, it can
help determine why some users prefer to purchase the same product at the same price on Amazon vs
Flipkart. The answer can be that the Amazon reseller has more warehouses and can deliver faster, or
maybe something more profound like user experience.

❖ Algorithms associated with Market Basket Analysis:


In market basket analysis, association rules are used to predict the likelihood of products being purchased
together. Association rules count the frequency of items that occur together, seeking to find associations that
occur far more often than expected.
Algorithms that use association rules include AIS, SETM and Apriori. The Apriori algorithm is commonly cited
by data scientists in research articles about market basket analysis. It identifies frequent items in the database
and then evaluates their frequency as the datasets are expanded to larger sizes.
R's rules package is an open-source toolkit for association mining using the R programming language. This
package supports the Apriori algorithm and other mining algorithms, including arulesNBMiner, opusminer,
RKEEL and RSarules.
With the help of the Apriori Algorithm, we can further classify and simplify the item sets that the consumer
frequently buys. There are three components in APRIORI ALGORITHM:
o SUPPORT
o CONFIDENCE
o LIFT
SUPPORT
It has been calculated with the number of transactions divided by the total number of transactions made,
1. Support = freq(A, B)/N
support(pen) = transactions related to pen/total transactions
i.e support -> 500/5000=10 percent
CONFIDENCE
Whether the product sales are popular on individual sales or through combined sales has been calculated. That
is calculated with combined transactions/individual transactions.

74
1. Confidence = freq (A, B)/ freq(A)
Confidence = combine transactions/individual transactions
i.e confidence-> 1000/500=20 percent
LIFT
Lift is calculated for knowing the ratio for the sales.
1. Lift = confidence percent/ support percent
Lift-> 20/10=2
When the Lift value is below 1, the combination is not so frequently bought by consumers. But in this case, it
shows that the probability of buying both the things together is high when compared to the transaction for the
individual items sold.

❖ Examples of Market Basket Analysis:


Here are the following examples that explore Market Basket Analysis by market segment, such as:
o Retail: The most well-known MBA case study is Amazon.com. Whenever you view a product on
Amazon, the product page automatically recommends, "Items bought together frequently." It is
perhaps the simplest and most clean example of an MBA's cross-selling techniques.
Apart from e-commerce formats, BA is also widely applicable to the in-store retail segment. Grocery
stores pay meticulous attention to product placement based and shelving optimization. For example,
you are almost always likely to find shampoo and conditioner placed very close to each other at the
grocery store. Walmart's infamous beer and diapers association anecdote is also an example of Market
Basket Analysis.
o Telecom: With the ever-increasing competition in the telecom sector, companies are paying close
attention to customers' services. For example, Telecom has now started to bundle TV and Internet
packages apart from other discounted online services to reduce churn.
o IBFS: Tracing credit card history is a hugely advantageous MBA opportunity for IBFS organizations.
For example, Citibank frequently employs sales personnel at large malls to lure potential customers
with attractive discounts on the go. They also associate with apps like Swiggy and Zomato to show
customers many offers they can avail of via purchasing through credit cards. IBFS organizations also
use basket analysis to determine fraudulent claims.
o Medicine: Basket analysis is used to determine comorbid conditions and symptom analysis in the
medical field. It can also help identify which genes or traits are hereditary and which are associated
with local environmental effects.

75
❖ Benefits of Market Basket Analysis:
The market basket analysis data mining technique has the following benefits, such as:

o Increasing market share: Once a company hits peak growth, it becomes challenging to determine new
ways of increasing market share. Market Basket Analysis can be used to put together demographic and
gentrification data to determine the location of new stores or geo-targeted ads.

o Behaviour analysis: Understanding customer behaviour patterns is a primal stone in the foundations of
marketing. MBA can be used anywhere from a simple catalogue design to UI/UX.

o Optimization of in-store operations: MBA is not only helpful in determining what goes on the shelves
but also behind the store. Geographical patterns play a key role in determining the popularity or strength of
certain products, and therefore, MBA has been increasingly used to optimize inventory for each store or
warehouse.

o Campaigns and promotions: Not only is MBA used to determine which products go together but also
about which products form keystones in their product line.

o Recommendations: OTT platforms like Netflix and Amazon Prime benefit from MBA by understanding
what kind of movies people tend to watch frequently.

76
Artificial Neural Network Tutorial

The term "Artificial Neural Network" is derived from Biological neural networks that develop the structure of a
human brain. Similar to the human brain that has neurons interconnected to one another, artificial neural networks
also have neurons that are interconnected to one another in various layers of the networks. These neurons are known
as nodes.

The typical Artificial Neural Network looks something like the given figure.

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks, cell nucleus represents
Nodes, synapse represents Weights, and Axon represents Output.

❖ Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

77
❖ The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to understand what a
neural network consists of. In order to define a neural network that consists of a large number of artificial
neurons, which are termed units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.
Artificial Neural Network primarily consists of three layers:

Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find hidden
features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output
that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias.
This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the output. Activation
functions choose whether a node should fire or not. Only those who are fired make it to the output layer.
There are distinctive activation functions available that can be applied upon the sort of task we are
performing.

78
❖ Advantages of Artificial Neural Network (ANN):
1. Parallel processing capability:
Artificial neural networks have a numerical value that can perform more than one task simultaneously.
2. Storing data on the entire network:
Data that is used in traditional programming is stored on the whole network, not on a database. The
disappearance of a couple of pieces of data in one place doesn't prevent the network from working.
3. Capability to work with incomplete knowledge:
After ANN training, the information may produce output even with inadequate data. The loss of
performance here relies upon the significance of missing data.
4. Having a memory distribution:
For ANN is to be able to adapt, it is important to determine the examples and to encourage the network
according to the desired output by demonstrating these examples to the network. The succession of the
network is directly proportional to the chosen instances, and if the event can't appear to the network in all
its aspects, it can produce false output.
5. Having fault tolerance:
Extortion of one or more cells of ANN does not prohibit it from generating output, and this feature makes
the network fault-tolerance.

❖ Disadvantages of Artificial Neural Network:


1. Assurance of proper network structure:
2. There is no particular guideline for determining the structure of artificial neural networks. The appropriate
network structure is accomplished through experience, trial, and error.
3. Unrecognized behavior of the network:
4. It is the most significant issue of ANN. When ANN produces a testing solution, it does not provide insight
concerning why and how. It decreases trust in the network.
5. Hardware dependence:
6. Artificial neural networks need processors with parallel processing power, as per their structure. Therefore,
the realization of the equipment is dependent.

❖ Difficulty of showing the issue to the network:


ANNs can work with numerical data. Problems must be converted into numerical values before being
introduced to ANN. The presentation mechanism to be resolved here will directly impact the performance of the
network. It relies on the user's abilities.

79
❖ How do artificial neural networks work?
Artificial Neural Network can be best represented as a weighted directed graph, where the artificial neurons
form the nodes. The association between the neurons outputs and neuron inputs can be viewed as the directed
edges with weights. The Artificial Neural Network receives the input signal from the external source in the form
of a pattern and image in the form of a vector. These inputs are then mathematically assigned by the notations
x(n) for every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights are the details
utilized by the artificial neural networks to solve a specific problem ). In general terms, these weights
normally represent the strength of the interconnection between neurons inside the artificial neural network.
All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to
scale up to the system's response. Bias has the same input, and weight equals to 1. Here the total of
weighted inputs can be in the range of 0 to positive infinity. Here, to keep the response in the limits of the
desired value, a certain maximum value is benchmarked, and the total of weighted inputs is passed through
the activation function.
The activation function refers to the set of transfer functions used to achieve the desired output. There is a
different kind of the activation function, but primarily either linear or non-linear sets of functions.

80
❖ Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human brain neuron and
network functions, an artificial neural network similarly performs tasks. The majority of the artificial neural
networks will have some similarities with a more complex biological partner and are very effective at their
expected tasks. For example, segmentation or classification.

1. Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results internally.
As per the University of Massachusetts, Lowell Centre for Atmospheric Research. The feedback networks
feed information back into itself and are well suited to solve optimization issues. The Internal system error
corrections utilize feedback ANNs.

2. Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at least
one layer of a neuron. Through assessment of its output by reviewing its input, the intensity of the network
can be noticed based on group behavior of the associated neurons, and the output is decided. The primary
advantage of this network is that it figures out how to evaluate and recognize input patterns.

81
UNIT – 5
NLP

It is a branch of AI that helps computers to understand, interpret and manipulate human language.

❖ Following tasks that can be done are: -


1. Translation.
2. Summarization.
3. Speech Recognition.
4. Sentiment Analysis.

❖ Components: -
1. Input sentence.
2. Morphological Processing (Vocabulary that includes words and expressions, Divides text into paragraph,
words, sentence).
3. Syntax analysis (Lexicon, Grammar).
4. Semantic Analysis (Semantic Rules).
5. Pragmatic Analysis (Contextual Information).

82
Different issue involved in NLP

NLP is difficult because Ambiguity and Uncertainty exist in the language.

❖ Ambiguity:
There are the following three ambiguity -
1. Lexical Ambiguity:
Lexical Ambiguity exists in the presence of two or more possible meanings of the sentence within a single
word.
Example:
Manya is looking for a match.
In the above example, the word match refers to that either Manya is looking for a partner or Manya is
looking for a match. (Cricket or other match)

2. Syntactic Ambiguity:
Syntactic Ambiguity exists in the presence of two or more possible meanings within the sentence.
Example:
I saw the girl with the binocular.
In the above example, did I have the binoculars? Or did the girl have the binoculars?

3. Referential Ambiguity:
Referential Ambiguity exists when you are referring to something using the pronoun.
Example: Kiran went to Sunita. She said, "I am hungry."
In the above sentence, you do not know that who is hungry, either Kiran or Sunita.

83
What is an Expert System?

An expert system is a computer program that is designed to solve complex problems and to provide decision-making
ability like a human expert. It performs this by extracting knowledge from its knowledge base using the reasoning
and inference rules according to the user queries.
The expert system is a part of AI, and the first ES was developed in the year 1970, which was the first successful
approach of artificial intelligence. It solves the most complex issue as an expert by extracting the knowledge stored
in its knowledge base. The system helps in decision making for compsex problems using both facts and heuristics
like a human expert. It is called so because it contains the expert knowledge of a specific domain and can solve any
complex problem of that particular domain. These systems are designed for a specific domain, such as medicine,
science, etc.
The performance of an expert system is based on the expert's knowledge stored in its knowledge base. The more
knowledge stored in the KB, the more that system improves its performance. One of the common examples of an ES
is a suggestion of spelling errors while typing in the Google search box.

Below is the block diagram that represents the working of an expert system:

Below are some popular examples of the Expert System:

o DENDRAL: It was an artificial intelligence project that was made as a chemical analysis expert system. It
was used in organic chemistry to detect unknown organic molecules with the help of their mass spectra and
knowledge base of chemistry.
o MYCIN: It was one of the earliest backward chaining expert systems that was designed to find the bacteria
causing infections like bacteraemia and meningitis. It was also used for the recommendation of antibiotics
and the diagnosis of blood clotting diseases.
o PXDES: It is an expert system that is used to determine the type and level of lung cancer. To determine the
disease, it takes a picture from the upper body, which looks like the shadow. This shadow identifies the
type and degree of harm.
o CaDeT: The CaDet expert system is a diagnostic support system that can detect cancer at early stages.

84
❖ Characteristics of Expert System:
o High Performance: The expert system provides high performance for solving any type of complex
problem of a specific domain with high efficiency and accuracy.
o Understandable: It responds in a way that can be easily understandable by the user. It can take input in
human language and provides the output in the same way.
o Reliable: It is much reliable for generating an efficient and accurate output.
o Highly responsive: ES provides the result for any complex query within a very short period of time.

❖ Components of Expert System:


An expert system mainly consists of three components:
o User Interface
o Inference Engine
o Knowledge Base

1. User Interface

With the help of a user interface, the expert system interacts with the user, takes queries as an input in a readable
format, and passes it to the inference engine. After getting the response from the inference engine, it displays the
output to the user. In other words, it is an interface that helps a non-expert user to communicate with the expert
system to find a solution.

85
2. Inference Engine (Rules of Engine)

o The inference engine is known as the brain of the expert system as it is the main processing unit of the
system. It applies inference rules to the knowledge base to derive a conclusion or deduce new information.
It helps in deriving an error-free solution of queries asked by the user.
o With the help of an inference engine, the system extracts the knowledge from the knowledge base.
o There are two types of inference engine:
o Deterministic Inference engine: The conclusions drawn from this type of inference engine are assumed to
be true. It is based on facts and rules.
o Probabilistic Inference engine: This type of inference engine contains uncertainty in conclusions, and
based on the probability.

Inference engine uses the below modes to derive the solutions:


o Forward Chaining: It starts from the known facts and rules, and applies the inference rules to add
their conclusion to the known facts.
o Backward Chaining: It is a backward reasoning method that starts from the goal and works backward
to prove the known facts.

3. Knowledge Base

o The knowledgebase is a type of storage that stores knowledge acquired from the different experts of the
particular domain. It is considered as big storage of knowledge. The more the knowledge base, the more
precise will be the Expert System.
o It is similar to a database that contains information and rules of a particular domain or subject.
o One can also view the knowledge base as collections of objects and their attributes. Such as a Lion is an
object and its attributes are it is a mammal, it is not a domestic animal, etc.

Components of Knowledge Base:


o Factual Knowledge: The knowledge which is based on facts and accepted by knowledge engineers
comes under factual knowledge.
o Heuristic Knowledge: This knowledge is based on practice, the ability to guess, evaluation, and
experiences.

Knowledge Representation: It is used to formalize the knowledge stored in the knowledge base using the
If-else rules.

Knowledge Acquisitions: It is the process of extracting, organizing, and structuring the domain
knowledge, specifying the rules to acquire the knowledge from various experts, and store that knowledge
into the knowledge base.

86
❖ Why Expert System?
Before using any technology, we must have an idea about why to use that technology and hence the same for
the ES. Although we have human experts in every field, then what is the need to develop a computer-based
system. So below are the points that are describing the need of the ES:
1. No memory Limitations: It can store as much data as required and can memorize it at the time of its
application. But for human experts, there are some limitations to memorize all things at every time.
2. High Efficiency: If the knowledge base is updated with the correct knowledge, then it provides a
highly efficient output, which may not be possible for a human.
3. Expertise in a domain: There are lots of human experts in each domain, and they all have different
skills, different experiences, and different skills, so it is not easy to get a final output for the query. But
if we put the knowledge gained from human experts into the expert system, then it provides an
efficient output by mixing all the facts and knowledge
4. Not affected by emotions: These systems are not affected by human emotions such as fatigue, anger,
depression, anxiety, etc.. Hence the performance remains constant.
5. High security: These systems provide high security to resolve any query.
6. Considers all the facts: To respond to any query, it checks and considers all the available facts and
provides the result accordingly. But it is possible that a human expert may not consider some facts due
to any reason.
7. Regular updates improve the performance: If there is an issue in the result provided by the expert
systems, we can improve the performance of the system by updating the knowledge base.

❖ Capabilities of the Expert System:


Below are some capabilities of an Expert System:
o Advising: It is capable of advising the human being for the query of any domain from the particular
ES.
o Provide decision-making capabilities: It provides the capability of decision making in any domain,
such as for making any financial decision, decisions in medical science, etc.
o Demonstrate a device: It is capable of demonstrating any new products such as its features,
specifications, how to use that product, etc.
o Problem-solving: It has problem-solving capabilities.
o Explaining a problem: It is also capable of providing a detailed description of an input problem.
o Interpreting the input: It is capable of interpreting the input given by the user.
o Predicting results: It can be used for the prediction of a result.
o Diagnosis: An ES designed for the medical field is capable of diagnosing a disease without using
multiple components as it already contains various inbuilt medical tools.

87
❖ Advantages of Expert System:

o These systems are highly reproducible.


o They can be used for risky places where the human presence is not safe.
o Error possibilities are less if the KB contains correct knowledge.
o The performance of these systems remains steady as it is not affected by emotions, tension, or fatigue.
o They provide a very high speed to respond to a particular query.

❖ Limitations of Expert System:

o The response of the expert system may get wrong if the knowledge base contains the wrong information.
o Like a human being, it cannot produce a creative output for different scenarios.
o Its maintenance and development costs are very high.
o Knowledge acquisition for designing is much difficult.
o For each domain, we require a specific ES, which is one of the big limitations.
o It cannot learn from itself and hence requires manual updates.

❖ Applications of Expert System:

o In designing and manufacturing domain


It can be broadly used for designing and manufacturing physical devices such as camera lenses and
automobiles.
o In the knowledge domain
These systems are primarily used for publishing the relevant knowledge to the users. The two popular ES
used for this domain is an advisor and a tax advisor.
o In the finance domain
In the finance industries, it is used to detect any type of possible fraud, suspicious activity, and advise
bankers that if they should provide loans for business or not.
o In the diagnosis and troubleshooting of devices
In medical diagnosis, the ES system is used, and it was the first area where these systems were used.
o Planning and Scheduling
The expert systems can also be used for planning and scheduling some particular tasks for achieving the
goal of that task.

88
Robotics and Artificial Intelligence

Robotics is a separate entity in Artificial Intelligence that helps study the creation of intelligent robots or machines.
Robotics combines electrical engineering, mechanical engineering and computer science & engineering as they have
mechanical construction, electrical component and programmed with programming language. Although, Robotics
and Artificial Intelligence both have different objectives and applications, but most people treat robotics as a subset
of Artificial Intelligence (AI). Robot machines look very similar to humans, and also, they can perform like humans,
if enabled with AI.

❖ What is Artificial Intelligence?

Artificial Intelligence is defined as the branch of Computer Science & Engineering, which deals with creating
intelligent machines that perform like humans. Artificial Intelligence helps to enable machines to sense,
comprehend, act and learn human like activities. There are mainly 4 types of Artificial Intelligence: reactive
machines, limited memory, theory of mind, and self-awareness.

❖ What is a robot?

A robot is a machine that looks like a human, and is capable of performing out of box actions and replicating
certain human movements automatically by means of commands given to it using programming. Examples:
Drug Compounding Robot, Automotive Industry Robots, Order Picking Robots, Industrial Floor Scrubbers and
Sage Automation Gantry Robots, etc.

❖ Components of Robot:
Several components construct a robot, these components are as follows:
o Actuators: Actuators are the devices that are responsible for moving and controlling a system or
machine. It helps to achieve physical movements by converting energy like electrical, hydraulic and
air, etc. Actuators can create linear as well as rotary motion.
o Power Supply: It is an electrical device that supplies electrical power to an electrical load. The
primary function of the power supply is to convert electrical current to power the load.
o Electric Motors: These are the devices that convert electrical energy into mechanical energy and are
required for the rotational motion of the machines.
o Pneumatic Air Muscles: Air Muscles are soft pneumatic devices that are ideally best fitted for
robotics. They can contract and extend and operate by pressurized air filling a pneumatic bladder.
Whenever air is introduced, it can contract up to 40%.
o Muscles wire: These are made up of nickel-titanium alloy called Nitinol and are very thin in shape. It
can also extend and contract when a specific amount of heat and electric current is supplied into it.
Also, it can be formed and bent into different shapes when it is in its martensitic form. They can
contract by 5% when electrical current passes through them.
o Piezo Motors and Ultrasonic Motors: Piezoelectric motors or Piezo motors are the electrical devices
that receive an electric signal and apply a directional force to an opposing ceramic plate. It helps a
robot to move in the desired direction. These are the best suited electrical motors for industrial robots.
o Sensor: They provide the ability like see, hear, touch and movement like humans. Sensors are the
devices or machines which help to detect the events or changes in the environment and send data
to the computer processor. These devices are usually equipped with other electronic devices.
Similar to human organs, the electrical sensor also plays a crucial role in Artificial Intelligence &
robotics. AI algorithms control robots by sensing the environment, and it provides real-time
information to computer processors.

89
❖ Applications of Robotics:
Robotics have different application areas. Some of the important applications domains of robotics are as
follows:
o Robotics in defence sectors: The defence sector is undoubtedly the one of the main parts of any
country. Each country wants their defence system to be strong. Robots help to approach inaccessible
and dangerous zone during war. DRDO has developed a robot named Daksh to destroy life-threatening
objects safely. They help soldiers to remain safe and deployed by the military in combat scenarios.
Besides combat support, robots are also deployed in anti-submarine operations, fire support, battle
damage management, strike missions, and laying machines.
o Robotics in Medical sectors: Robots also help in various medical fields such as laparoscopy,
neurosurgery, orthopaedic surgery, disinfecting rooms, dispensing medication, and various other
medical domains.
o Robotics in Industrial Sector: Robots are used in various industrial manufacturing industries such as
cutting, welding, assembly, disassembly, pick and place for printed circuit boards, packaging &
labelling, palletizing, product inspection & testing, colour coating, drilling, polishing and handling the
materials.
Moreover, Robotics technology increases productivity and profitability and reduces human efforts,
resulting from lower physical strain and injury. The industrial robot has some important advantages,
which are as follows:
o Accuracy
o Flexibility
o Reduced labour charge
o Low noise operation
o Fewer production damages
o Increased productivity rate.
o Robotics in Entertainment: Over the last decade, use of robots is continuously getting increased in
entertainment areas. Robots are being employed in entertainment sector, such as movies, animation,
games and cartoons. Robots are very helpful where repetitive actions are required. A camera-wielding
robot helps shoot a movie scene as many times as needed without getting tired and frustrated. A big-
name Disney has launched hundreds of robots for the film industry.
o Robots in the mining industry: Robotics is very helpful for various mining applications such as
robotic dozing, excavation and haulage, robotic mapping & surveying, robotic drilling and explosive
handling, etc. A mining robot can solely navigate flooded passages and use cameras and other sensors
to detect valuable minerals. Further, robots also help in excavation to detect gases and other materials
and keep humans safe from harm and injuries. The robot rock climbers are used for space exploration,
and underwater drones are used for ocean exploration.

❖ Natural Language Processing:


NLP (Natural Languages Processing) can be used to give voice commands to AI robots. It creates a strong
human-robot interaction. NLP is a specific area of Artificial Intelligence that enables the communication
between humans and robots. Through the NLP technique, the robot can understand and reproduce human
language. Some robots are equipped with NLP so that we can't differentiate between humans and robots.
Similarly, in the health care sector, robots powered by Natural Language Processing may help physicians to
observe the decease details and automatically fill in EHR. Besides recognizing human language, it can learn
common uses, such as learn the accent, and predict how humans speak.

❖ What are the advantages of integrating Artificial Intelligence into robotics?


o The major advantages of artificially intelligent robots are social care. They can guide people, especially
come to aid for older people, with chatbot like social skills and advanced processors.

90
o Robotics also helps in Agricultural industry with the help of developing AI based robots. These robots
reduce the farmer's workload.
o In Military industry, Military bots can spy through speech and vision detectors, along with saving lives by
replacing infantry
o Robotics also employed in volcanoes, deep oceans, extremely cold places, or even in space where normally
humans can't survive.
o Robotics is also used in medical and healthcare industry as it can also perform complex surgeries that have
a higher risk of a mistake by humans, but with a pre-set of instructions and added Intelligence. AI
integrated robotics could reduce the number of casualties greatly.

❖ Difference in Robot System and AI Programs:


1. AI Programs:
Usually, we use to operate them in computer-simulated worlds.
Generally, input is given in the form of symbols and rules.
To operate this, we need general-purpose/Special-purpose computers.
2. Robots:
Generally, we use robots to operate in the real physical world.
Inputs are given in the form of the analogue signal or in the form of the speech waveform.
Also, to operate this, special hardware with sensors and effectors are needed.

91

You might also like