Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
40 views12 pages

AI LabReport

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views12 pages

AI LabReport

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CS367-AI : Pre-Midterm Lab Report

Group Name: Logic


Amisha Lalwani (202251013), Dhwani Saliya (202251041), Lakshya Yadav (202251067), Prashant Bharti (202251102)
Indian Institute of Information Technology Vadodara (IIITV)

Abstract—In this report, we have addressed seven out of the missionaries and three cannibals across a river using a boat
eight assigned lab problems. The tasks include: Week 1: Modeling that can hold up to two people. The search space consists of
problems as state-space search challenges and solving them using possible configurations of people on either side of the river.
BFS/DFS. This was achieved by implementing the missionaries
and cannibals problem, as well as the rabbit leap problem. BFS ensures finding the optimal solution, while DFS may
Week 2: Designing a graph search agent and understand the return a solution faster but does not guarantee optimality. (i)
use of a hash table, queue in state space search. This is done A state is represented in this problem as (M,C,B) where M is
by implementing a puzzle-8 problem and a plagiarism detection the number of missionaries on the left bank, C is the number
system using the A* search algorithm. Week 3: Understanding the of cannibals on the left bank and B represents the boat’s
use of Heuristic function for reducing the size of the search space.
This we did by solving the game of marble solitaire and writing position(1 - boat is on left, 0 - it is on right side). The initial
programs for generating k-SAT problems and solving a set of state is (3, 3, 1) and the goal state is (0, 0, 0). While defining
uniform random 3-SAT problems for different combinations of the state transitions, we need to keep in mind that the number
m and n, and comparing their performance. Week 4: Non- of cannibals should not exceed the number of missionaries.
deterministic Search — Simulated Annealing For problems with So, the possible state transitions can be achieved by moving 1
large search spaces, randomized search becomes a meaningful
option given partial/ full-information about the domain. Week- missionary and 1 cannibal, or 2 missionaries, or 2 cannibals, or
6: . Three key objectives are pursued: determining the tolerable 1 missionary, or 1 cannibal. There are about 32 possible states
error for stored patterns in provided code, formulating an energy (4 × 4 × 2) because M has 4 possiblities - 0,1,2,3 as well as C
function for the Eight-rook problem while selecting suitable and B has only two - 0,1. (ii) Breadth-First Search (BFS) is a
weights, and solving a 10-city Traveling Salesman Problem (TSP) level-order traversal. We started from the initial state (3, 3, 1).
using a Hopfield network while estimating the necessary weight
count. Through these endeavors, the study aims to evaluate We explored all possible valid states reachable by one boat
the viability of Hopfield networks in real-world combinatorial trip. And then we continued expanding states until the goal
optimization tasks, shedding light on their capabilities and state (0, 0, 0) was reached. BFS guarantees finding the shortest
limitations in practical problem-solving contexts. Week 7: We path if the solution exists, making it optimal for this problem.
implemented a binary bandit using a stochastic reward system (iii) Depth-First Search (DFS) explores as far as possible along
and developed a 10-armed bandit with non-stationary rewards to
observe how an agent using a modified epsilon-greedy approach each branch before backtracking. It does not guarantee finding
adapts in dynamic environments. The report highlights crucial the optimal solution. We started from the initial state (3, 3, 1)
elements of MENACE and the challenges of non-stationary and then explored as deep as possible in one branch of possible
rewards in RL, showcasing effective strategies for tackling these states. If a solution was found, we returned it otherwise,
problems. Week 8: we explored Markov Decision Processes backtracking was done. (iv)BFS guarantees the shortest path
(MDPs) and their applications in real-world problems like the
Gbike bicycle rental problem. We formulated the problem using and is always optimal. DFS does not guarantee optimality.
states, actions, rewards, and transition probabilities and solved The solution obtained might have more steps than necessary,
it using policy iteration. This involved evaluating the current or it might explore unproductive branches before finding the
policy and improving it iteratively to maximize long-term re- goal. Time complexity for BFS was O(bd ) where b is the
wards. Optimizations such as truncated Poisson distributions and branching factor (the average number of successors for a state),
precomputed probabilities were implemented for computational
efficiency. and d is the depth of the shallowest goal state. BFS explores
Index Terms—State-space search, Breadth-First Search, Depth- every state at a given depth level, making it slower in terms of
First Search, AI Algorithms, Complexity Analysis, Simulated An- execution time but ensuring optimality. Time complexity DFS
nealing, Optimization, Markov Decision Process, Reinforcement was O(bm ) where m is the maximum depth of the search tree.
Learning, Non-stationary rewards, Policy iteration, Value itera- DFS can get lucky and find a solution quickly, but it may also
tion, Discount factor, Transition probabilities, Reward function,
Truncated Poisson distribution, Dynamic programming explore a much larger portion of the tree if the solution lies
deep. The space complexity of BFS is O(bd ) whereas that of
I. W EEK 1 : T O BE ABLE TO MODEL A GIVEN PROBLEM IN DFS is O(bm) where m is the max depth. It is more space-
TERMS OF STATE SPACE SEARCH PROBLEM AND SOLVE THE efficient than BFS.
SAME USING BFS/ DFS B. Rabbit leap problem
A. Missionaries and Cannibals problem In the rabbit leap problem, three east-bound rabbits stand in
The Missionaries and Cannibals problem is modeled as a a line blocked by three west-bound rabbits. They are crossing
state-space search problem where the goal is to transport three a stream with stones placed in the east west direction in a
line. There is one empty stone between them. The rabbits can • BFS: Uses a queue (FIFO) to select nodes.
only move forward one or two steps. They can jump over • DFS: Uses a stack (LIFO) to select nodes.
one rabbit if the need arises, but not more than that. So we • A*: Uses a priority queue to select nodes based on the
need to find out whether they can cross each other without lowest cost (g + h).
stepping into the water. (i) The initial state is represented by A set that stores already visited nodes to prevent revisiting
(E,E,E,-,W,W,W) (three east-bound rabbits on the left, three them, ensuring the algorithm doesn’t get stuck in cycles or
west-bound on the right, and one empty stone in the middle). redundant paths. For selecting nodes,
Goal stae is represented by (W,W,W,-,E,E,E). The search space • For BFS, select the node from the front of the queue
can be around 7!, but there would be a subset of it that (FIFO).
would be having valid moves. (ii) Using BFS, as we know • For DFS, select the node from the top of the stack (LIFO).
it is a level order traversal, it would be searching level wise • For A*, select the node with the lowest cost (g + h) from
states that would eventually go the goal state. The solution is the priority queue.
implemented in the code whose github repository is attached
After selecting a node, we check if it’s the goal node. If yes,
at the end. (iii) The DFS implementation will explore deeper
the algorithm reconstructs the path by backtracking through
into the search space first and will provide a sequence of
the node’s parents and returns the solution. The agent gener-
states leading to the goal. However, it may not necessarily
ates neighboring nodes from the current state. Successors are
be optimal, as DFS can get stuck in deep paths without
generated based on possible legal moves or transitions in the
finding the shortest route. (iv) The comparison between DFS
environment. These successors are added to the frontier unless
and BFS remains the same as that we got in the missionary
they’ve already been visited or are already in the frontier. If
and cannibals problem. Also the time and space complexity
the goal is reached, backtrack from the goal node to the start
remains the same, can refer to that part.
node using parent pointers to reconstruct the solution path.
II. W EEK 2 : T O DESIGN A GRAPH SEARCH AGENT AND This section outlines the key functions used to simulate the
UNDERSTAND THE USE OF A HASH TABLE , QUEUE IN STATE environment for the Puzzle-8 problem, which involves sliding
SPACE SEARCH . tiles to reach a goal state.
A. In-lab Discussion: Puzzle-8 problem GOAL_STATE = [1, 2, 3, 4, 5, 6, 7, 8, 0]

Algorithm 1 Graph Search Algorithm 1) Node Class: Represents the current state of the puzzle,
0: function G RAPH S EARCH (start state, goal state)
its parent node, and the cost (number of moves) to reach that
0: Initialize the frontier as a queue with the start node state.
0: Initialize an empty set for visited nodes class Node:
0: visited ← ∅ def __init__(self, state, parent=None, g=0):
0: while frontier is not empty do self.state = state
0: currentNode ← Remove the first node from frontier self.parent = parent
0: if currentNode is equal to goal state then self.g = g
0: return Solution found: path to goal
0: end if 2) Utility Functions:
0: if currentNode is not in visited then • is_solvable(state): Checks if the puzzle is solv-

0: Add currentNode to visited able by counting inversions.


0: for each successor of currentNode do • generate_random_state(): Generates a random

0: if successor is not in visited and not in frontier state until a solvable configuration is found.
then • get_empty_tile_index(state): Returns the in-

0: Add successor to frontier dex of the empty tile (0).


0: end if • get_possible_moves(state): Lists possible
0: end for moves for the empty tile based on its current position.
0: end if 3) SuccessorGeneration:
0: end while generate_successors(node): Produces successor
0: return Failure: no solution exists nodes by swapping the empty tile with adjacent tiles, creating
0: end function=0 new valid configurations.
4) Display Function: display_puzzle(state):
You can view the flowchart at the following link: Click here Prints the current state of the puzzle in a 3x3 grid format for
to view the flowchart easy visualization.
(i) Implementation details are provided in the following lines. (iii) Iterative Deepening Search is particularly useful in
The frontier is a collection of nodes that have been discovered scenarios where the depth of the solution is unknown, such as
but not yet explored. The method of selecting nodes from the puzzles (like the 8-puzzle), games, and certain AI problems
frontier determines the type of search strategy: requiring pathfinding in large search spaces. It is an effective
compromise between the memory efficiency of DFS and the character into one string. Deletion: Delete a character from
completeness of BFS. It combines the benefits of both Depth- one string. Substitution: Substitute one character in the first
First Search (DFS) and Breadth-First Search (BFS) by using string for a character in the second string. Each of these
a depth-limited search that progressively increases the depth operations has a cost of 1, and the total cost is updated as the
limit until a solution is found. (iv) Uniform Cost Search search progresses. BFS and Solvability Check In addition to
is a search algorithm that expands the least-cost node first, A*, the BFS approach can be used for more straightforward
making it suitable for problems where all moves have the text alignment, exploring all possible sequences without
same cost. The uniform cost function initializes the search with heuristic guidance. However, A* is more efficient as it
the starting state. It maintains a priority queue (frontier) that prunes unnecessary paths using the heuristic function. We
orders nodes based on their cumulative cost. The algorithmic have implemented the algorithm for plag detection. Here
steps include: Dequeues the least-cost node from the frontier. are some things which we kept in mind, and the output is
Checks if the current node’s state matches the goal state. If also compared as per the testcases provided for the problem.
so, it invokes the backtracking function to retrieve the solution Text preprocessing is critical in preparing the documents
path. If not, it generates the node’s successors and adds them for plagiarism detection. The input documents are tokenized
to the frontier, provided they have not been visited. (vi) into sentences, normalized by converting to lowercase,
and cleaned by removing punctuation. The edit distance
function is already integrated in the A* search code. Bases
on the test cases provided, what we have understood goes like:

Case 1:
B. Plagiarism detection problem Input:
Plagiarism detection plays a vital role in academic and doc1 = "This is so beautiful. It is great
content-driven fields, where it’s essential to spot similarities to hear."
between documents. A common method for tackling this doc2 = "This is so beautiful. It is great
challenge is treating it as a sequence alignment problem. to hear."
The objective is to compare two texts by calculating the edit
distance—how many insertions, deletions, or substitutions Output:
are needed to convert one text into another. We use A* Sentence from doc1 Sentence from doc2 Edit Distance
search algorithm for trying to solve the problem. Given two This is so beautiful This is so beautiful 0
textual documents, the problem is to find the alignment that It is great to hear It is great to hear 0
minimizes the edit distance, indicating the level of similarity
between the documents. A lower edit distance signifies a Case 2:
higher likelihood of plagiarism. The edit distance between Input:
two sequences (strings) is the minimum number of operations doc1 = "This is so beautiful. It is great
(insertions, deletions, and substitutions) needed to convert to hear."
one sequence into the other. This can be computed using doc2 = "This is incredibly beautiful. It
dynamic programming or search algorithms like A*. Te initial is wonderful to listen."
state corresponds to the start of both documents. The goal
state is reached when all sentences in both documents are Output:
Sentence from doc1 Sentence from doc2 Edit Distance
aligned.The plagiarism detection system uses the A* search
algorithm to efficiently compute the edit distance between two This is so beautiful This is incredibly beautiful 7
documents. A* is a search algorithm that finds the least-cost It is great to hear It is wonderful to listen 7
path in a graph, making it suitable for sequence alignment Case 3:
tasks. In this case, the graph is represented by the various Input:
states of alignments between the two texts. g(n) represents doc1 = "The quick brown fox."
the cost of aligning two substrings up to the current point doc2 = "A cat sat."
(i.e., the edit distance so far). h(n) is the heuristic estimate
of the remaining cost to align the rest of the substrings. Output:
The system explores different alignments, guided by this Sentence from doc1 Sentence from doc2 Edit Distance
cost function, and finds the alignment with the smallest total The quick brown fox A cat sat 23
cost. The heuristic used is the minimum of the remaining
lengths of the two strings. This ensures that the estimate Case 4:
is optimistic (i.e., never overestimates the remaining cost). Input:
Successor Generation For each state (i.e., partial alignment doc1 = "The cat is on the roof."
of the two strings), the algorithm generates successor states doc2 = "The dog is on the roof."
by considering three possible actions: Insertion: Insert a
Output: 2) Heuristic Functions with Justification: Heuristic Func-
Sentence from doc1 Sentence from doc2 Edit dist tion 1: Count of Remaining Marbles. This heuristic simply
The cat is on the roof The dog is on the roof 2 tallies the number of marbles still present on the board.
A lower count of remaining marbles indicates proximity to
Let’s understand what each case here means:
achieving the goal state, thereby effectively guiding the search.
Case 1: Identical documents with zero edit distance.
Heuristic Function 2: Distance to Center Position. This
Case 2: Slightly modified document with minor changes
heuristic computes the cumulative distance of all marbles to
(synonyms, etc.) leading to a low edit distance.
the central position. Since the aim is to position a single marble
Case 3: Completely different documents resulting in a high
at the center, minimizing the distance of all marbles is likely
edit distance.
to expedite reaching the goal.
Case 4: Partial overlap where two sentences have some com-
3) Best-First Search Algorithm: The best-first search em-
mon words, resulting in a low edit distance.
ploys a priority queue to navigate nodes based on their
And obviously, you can checkout the code from our github
heuristic values, concentrating on the most promising nodes
repository.
first.
III. W EEK 3 : T O UNDERSTAND THE USE OF H EURISTIC 4) A* Algorithm: The A* algorithm integrates the actual
FUNCTION FOR REDUCING THE SIZE OF THE SEARCH cost incurred to reach a node with the heuristic estimate of
SPACE . E XPLORE NON - CLASSICAL SEARCH ALGORITHMS the cost to arrive at the goal from that node.
FOR LARGE PROBLEMS . 5) Comparison of Results from Various Search Algorithms:
A. Solving marble solitaire Analysis:
• Priority Queue Search: This algorithm effectively ex-
plores states based on path costs but may not leverage
heuristics optimally.
• Best-First Search: Typically more efficient than pure
path cost searches, yet it does not guarantee finding the
optimal solution as it neglects the cost to reach a node.
• A* Algorithm: Generally regarded as the most effective
method for pathfinding issues, A* strikes a balance be-
tween cost and heuristic, thereby ensuring both optimality
and completeness.
B. K-SAT problem
The k-SAT problem is a key challenge in computational
theory that concerns the satisfiability of boolean formulas
expressed in conjunctive normal form (CNF). This paper
discusses the implementation of a program aimed at generating
uniform random k-SAT problems and evaluating the perfor-
mance of various search algorithms used to solve them. The
algorithms examined include Hill Climbing, Beam Search, and
Variable Neighborhood Descent.
The objective is to randomly generate k-SAT problems
based on the parameters k (the number of literals in each
clause), m (the number of clauses), and n (the number of
Fig. 1. Initial configuration.
distinct variables). Each clause is designed to contain distinct
variables or their negations, resulting in instances that fall
The initial configuration of the marble board features an under the category of fixed clause length models of SAT,
arrangement where one space is vacant. The objective is to known as uniform random k-SAT problems. To accomplish
eliminate marbles by jumping over them, ultimately leaving this, we implemented a function that creates m clauses, each
just one marble in the center of the board. The initial state containing k distinct variables. This function ensures that the
of the board is depicted in Fig. 1. The goal is to achieve a variables are either presented in their positive form or negated
configuration with only one marble remaining at the center. at random.
1) Priority Queue-Based Search with Path Cost: In a We implemented three search algorithms to solve the gen-
priority queue-based search, each state is maintained in a erated k-SAT problems:
queue that prioritizes nodes according to their path cost, which • Hill Climbing: This algorithm initiates with a random
is the cumulative cost incurred to reach that node. A typical assignment of truth values to the variables. It evaluates
approach to manage this is by utilizing a min-heap, facilitating the current solution and iteratively improves it by flipping
efficient access to the state with the lowest cost. the value of one variable at a time, consistently moving
to the neighboring solution that provides the best increase A) Generate a neighbor by flipping the vari-
in satisfaction. able’s value.
• Beam Search: This algorithm maintains a fixed number B) Evaluate the neighbor.
of the best solutions, determined by a defined beam width, C) If the neighbor is better than the current
and explores the neighbors of each solution. It retains solution:
only the most promising solutions for further exploration. D) Update the current solution to this neighbor.
• Variable Neighborhood Descent: This algorithm ex- E) Update the evaluation score.
plores multiple neighborhoods, making local improve- F) Set ‘improved‘ to true and break the loop.
ments iteratively until no further enhancements can be ii) If an improvement was made, break to start
achieved. from the first neighborhood.
To guide the search algorithms, we implemented two heuris- c) If no improvement was made, return the current
tic functions: solution.
• Heuristic Function 1: Counts the number of satisfied
We conducted experiments with various combinations of
clauses. m and n to assess the performance of the algorithms. Each
• Heuristic Function 2: Calculates the number of unsat-
algorithm was tested on several randomly generated k-SAT
isfied clauses. problems, measuring effectiveness based on the number of
1) Hill Climbing Algorithm: satisfied clauses.
1) Initialize a random solution (assignment of truth values We have outlined the methodology for generating uniform
to variables). random 3-SAT problems and evaluating different search algo-
2) Evaluate the solution (count the number of satisfied rithms. The implementation of Hill Climbing, Beam Search,
clauses). and Variable Neighborhood Descent, coupled with heuristic
3) While true: evaluations, provides insights into the effectiveness of each
a) Generate neighbors by flipping the value of each approach. The outputs from the experiments reveal critical
variable one at a time. performance metrics that can be analyzed to determine the
b) Evaluate each neighbor and find the best neighbor. most effective strategy for solving k-SAT problems.
c) If the best neighbor is better than the current
solution: IV. W EEK 4: N ON - DETERMINISTIC S EARCH —
i) Update the current solution to the best neighbor. S IMULATED A NNEALING F OR PROBLEMS WITH LARGE
SEARCH SPACES , RANDOMIZED SEARCH BECOMES A
ii) Update the current evaluation score.
MEANINGFUL OPTION GIVEN PARTIAL /
d) If no neighbor improves the solution, return the
FULL - INFORMATION ABOUT THE DOMAIN .
current solution.
2) Beam Search Algorithm: A. Traveling Salesman Problem
1) Initialize a list of solutions with a random solution The Traveling Salesman Problem (TSP) is a well-known
(assignment of truth values). NP-hard problem. It involves determining the shortest possible
2) While true: route that visits each city exactly once and returns to the
a) Create an empty list for new solutions. starting point, given a graph where nodes represent cities and
b) For each solution in the current list: edges represent the travel cost between them.
i) Generate neighbors by flipping the value of We implemented Simulated Annealing to solve the TSP with
each variable. 20 tourist destinations across Rajasthan. This algorithm effi-
ii) Add each neighbor to the new solutions list. ciently improves the route by iteratively exploring neighboring
c) Sort the new solutions based on their evaluation solutions and either accepting improvements or, occasionally,
scores (number of satisfied clauses). worse solutions to avoid getting stuck in local optima. The
d) Retain only the top w solutions (beam width). 20 tourist spots include: 1. Jaipur (Amber Fort) 2. Jaisalmer
e) If the best solution satisfies all clauses, return it. (Jaisalmer Fort) 3. Udaipur (City Palace) 4. Jodhpur (Mehran-
garh Fort) 5. Mount Abu (Dilwara Temples) 6. Bikaner (Ju-
3) Variable Neighborhood Descent Algorithm:
nagarh Fort) 7. Ajmer (Dargah Sharif) 8. Pushkar (Brahma
1) Initialize a random solution (assignment of truth values Temple) 9. Ranthambore (National Park) 10. Alwar (Sariska
to variables). Tiger Reserve) 11. Bundi (Taragarh Fort) 12. Chittorgarh
2) Evaluate the solution (count the number of satisfied (Chittorgarh Fort) 13. Bharatpur (Keoladeo National Park)
clauses). 14. Kota (Seven Wonders Park) 15. Shekhawati (Frescoes)
3) While true: 16. Kumbhalgarh (Kumbhalgarh Fort) 17. Jhalawar (Jhalawar
a) Set a flag ‘improved‘ to false. Fort) 18. Barmer (Barmer Fort) 19. Sikar (Khatu Shyam Ji)
b) For each neighborhood: 20. Nathdwara (Shrinathji Temple)
i) For each variable in the neighborhood: Key steps in the process include:
Cost Calculation: The calculate_cost() function
computes the total distance by summing up the travel distances
between consecutive cities in the tour, and then adds the return
distance to the starting city.
Neighbor Generation: The generate_neighbor()
function randomly swaps two cities in the current tour, pro-
viding a new neighboring solution.
Acceptance Criterion: The acceptance rule follows the
standard Simulated Annealing procedure, where new solutions
are accepted if they reduce the total cost. Worse solutions are
accepted probabilistically, depending on the temperature.
Cooling Schedule: The temperature decreases with each • This is a tour for the XQF131 VLSI instance. It has
iteration using a cooling rate. We chose a rate of 0.995 to length 564:
allow slow cooling and thorough exploration of the solution
space.
Additional steps involved:
1. Distance Matrix: A symmetric matrix representing the
distances between all pairs of tourist spots, using real-world
data. 2. Initial Solution: A random permutation of the cities
forms the starting tour. 3. Neighbor Generation: The algo-
rithm explores neighboring tours by swapping two cities in the
current tour. 4. Acceptance Criterion: Even when a new tour 2) Problem 2: XQG237 (237 Points):
has a higher cost, it may still be accepted based on the current
temperature to avoid local optima. 5. Cooling Schedule: As • Best tour length: 1277.551
iterations progress, the temperature reduces, decreasing the • Optimal tour length: 1019
likelihood of accepting worse solutions. 6. Output: The best • Tour: 153, 152, 151, 150, 149, . . . , 221
tour and its associated cost are reported. • Graphical Output:
The algorithm parameters include: - Initial Temperature:
Controls the likelihood of accepting worse solutions at the
start. - Cooling Rate: Governs the rate at which the temperature
decreases. - Number of Iterations: Determines how many steps
the algorithm performs before termination.
Results: After running the algorithm, we obtained an opti-
mized tour of Rajasthan’s tourist spots. The initial tour had a
cost of 8305, which reduced to 4940 after several iterations,
showing a successful optimization process.
The final results include both the best tour order and its
associated cost:
• This is a tour for the XQG237 VLSI instance. It has
• Best Tour: The order in which the cities are visited.
length 1019:
• Best Cost: The total distance of the optimal tour.
For VLSI-based Traveling Salesman Problems (TSP), we used
the same code for the 5 problems we only changed the file
paths for the different datasets.

B. Problem Results

1) Problem 1: XQF131 (131 Points): 3) Problem 3: PMA343 (342 Points):

• Best tour length: 697.394 • Best tour length: 2015.655


• Optimal tour length: 564 • Optimal tour length: 1368
• Tour: 52, 44, 45, 53, 54, 46, 47, . . . , 130 • Tour: 55, 53, 49, 47, 43, 42, . . . , 324
• Graphical Output: • Graphical Output:
• This is a tour for the PMA343 VLSI instance. It has
length 1368:

4) Problem 4: PKA379 (379 Points):


• Best tour length: 2030.461
• Optimal tour length: 1332
• Tour: 136, 137, 138, 139, 140, 143, . . . , 324
• Graphical Output: C. Jigsaw Puzzle
The objective is to solve a jigsaw puzzle using an Artificial
Intelligence (AI) technique known as Simulated Annealing.
In this scenario, the puzzle consists of a scrambled image,
where the pieces must be rearranged to restore the correct
image. This problem can be framed as a state space search,
with each state representing a distinct arrangement of the
puzzle pieces. The aim is to discover the arrangement that
reconstructs the original image.

The jigsaw puzzle can be broken down into the following


components:
• : This is a tour for the PKA379 VLSI instance. It has State Representation:
length 1332: Each state is defined by the current arrangement of the
puzzle pieces. For an n × n puzzle, this can be represented as
a list where each element corresponds to a specific piece.
Initial State:
5) Problem 5: BCL380 (380 Points): The initial state is the scrambled configuration of the puzzle
• Best tour length: 2055.4938 pieces, which is loaded from the scrambled_lena.mat
• Optimal tour length: 1621 file.
• Tour: 94, 82, 93, 88, 81, 92, . . . , 70 Goal State:
• Graphical Output: The goal state refers to the correct arrangement of the puzzle
pieces, meaning that all pieces are in their intended positions
to complete the image.
Actions:
An action involves selecting two pieces from the current
arrangement and swapping their positions. The algorithm
performs such swaps repeatedly to explore various possible
states.
Cost Function:
The cost function quantifies how far a given arrangement
is from the goal state. A simple approach would be to count
the number of pieces that are misplaced, or sum the positional
• This is a tour for the BCL380 VLSI instance. It has differences between the actual and desired locations of each
length 1621: piece.
Heuristic: in two states (+1 or -1), favoring agreement between adjacent
A heuristic helps estimate how close a current state is to the nodes’ states towards a global energy minimum. Each neuron
goal. One straightforward heuristic might count the number in the network is characterized by:
of correctly placed pieces, or calculate the overall difference • Connections to other neurons with unique strengths.
between the current and goal positions. • Activation computed from net input and connection
Simulated Annealing is a probabilistic search technique that weights.
draws inspiration from the annealing process in metallurgy. • A bipolar output state derived from activation and a
Here, it is employed to explore the state space by progressively thresholding function.
improving the arrangement of the puzzle pieces. The algorithm
begins with a high “temperature,” allowing for broader explo- P ROBLEM 1: T HE 8-ROOK P ROBLEM
ration, including the acceptance of suboptimal moves. As the
The 8-rook problem is a classic chess problem that involves
temperature decreases, the algorithm becomes more selective,
placing eight rooks on a standard 8 × 8 chessboard in such a
focusing on refining the current solution.
way that no two rooks threaten each other. In chess, a rook
Below is an outline of the algorithm:
can move horizontally or vertically any number of squares,
Initial State: The algorithm begins with the scrambled
and it threatens any piece in its row or column.
puzzle.
Temperature: Initially, the temperature is set high, promot- Challenge
ing wider exploration, even accepting states that increase the
cost. The challenge is to find a configuration where all eight rooks
State Transition: At each step, two puzzle pieces are are placed on the board such that:
randomly swapped, generating a new state. • No two rooks share the same row or column.
Acceptance Probability: The decision to accept a new state • Each row and column contains exactly one rook.
is based on the change in cost. Even if the new state has a
higher cost (a worse arrangement), it might still be accepted Initial Configuration
with a probability that depends on the current temperature, We start with the initial configuration:
encouraging exploration of the state space.  
Cooling Schedule: The temperature gradually decreases ac- 0 0 0 0 0 1 0 0
0 0 1 0 0 0 0 0
cording to a cooling rate, reducing the likelihood of accepting  
0 0 0 1 0 0 0 1
suboptimal states and focusing on improvement.  
0 0 0 0 0 1 0 0
Termination: The process continues until the temperature  ,
0 0 3 0 0 0 0 0
falls below a certain threshold, or no further improvements  
0 0 0 0 0 0 0 0
can be made.  
0 1 0 1 0 0 0 0
Results:
Applying the simulated annealing algorithm to the jigsaw 0 0 0 0 0 0 0 0
puzzle yields the following observations: where 1 represents the position of a rook on the 8 × 8 board.
- The algorithm effectively explores a variety of puzzle
arrangements and works towards minimizing the cost function. Energy Function
- The initial temperature and cooling rate are crucial to The energy function for this problem is defined as the
the search process. A higher initial temperature facilitates discrepancy from the ideal state where each row and each
exploration of a broader set of potential solutions, while a column contains exactly one rook. A lower energy corresponds
slower cooling rate allows the search to concentrate on local to fewer conflicts.
improvements. - Swapping puzzle pieces is a simple and
effective state transition method. The randomness involved Approach
ensures diverse exploration of the state space.
The code:
V. W EEK 6 : T O UNDERSTAND THE WORKING OF • Defines a function to compute the energy of a given board
H OPFIELD NETWORK AND USE IT FOR SOLVING SOME configuration.
INTERESTING COMBINATORIAL PROBLEMS • Iterates through permutations of initial placements, shuf-
A. Introduction fling them randomly.
Hopfield networks, a subtype of Recurrent Artificial Neural • Searches for a configuration where the energy reaches

Networks (RNNs) introduced by John Hopfield in 1982, repre- zero, indicating a solution.
sent the first instance of associative neural networks capable of
P ROBLEM 2: PATTERN R ECOGNITION AND
producing emergent associative memory. Associative memory
R ECONSTRUCTION
allows retrieval and completion of a memory using incomplete
or noisy stimuli, akin to recalling a memory triggered by We implemented a Hopfield network to recognize and
hearing a familiar song. Hopfield networks operate with nodes reconstruct patterns.
Approach C ONCLUSION
• Four patterns (D, J, C, M) were defined as matrices and The Hopfield network successfully addressed a variety of
stored in an array X. combinatorial problems:
• Hebb’s rule was applied to compute the weight matrix • Associative memory retrieval and reconstruction.
W based on the outer product of each pattern with itself. • Constraint satisfaction in the 8-rook problem.
• A noisy starting pattern was iteratively updated until • Approximate solutions for TSP using energy minimiza-
convergence, determined by the difference between suc- tion.
cessive iterations falling below a predefined threshold.
VI. W EEK 7 : U NDERSTANDING E XPLOITATION -
E XPLORATION IN SIMPLE N - ARM BANDIT REINFORCEMENT
Error Tolerance LEARNING TASK , EPSILON - GREEDY ALGORITHM
• 25 random patterns were generated. we have tried to understand the concepts involved in re-
• Noise levels were introduced, and the average Hamming inforcement learning, and also tried to implement epsilon-
distance was computed between original and retrieved greedy algorithm for binary and multi-armed bandit problems.
patterns. Another objective was to also understand Matchbox Educable
• Results quantified the network’s robustness to noise, with Naughts and Crosses Engine (MENACE) developed by Donald
higher error tolerance indicating better performance. Michie. We implemented a binary bandit using a stochastic
reward system and developed a 10-armed bandit with non-
P ROBLEM 3: T RAVELING S ALESMAN P ROBLEM (TSP) stationary rewards to observe how an agent using a modified
epsilon-greedy approach adapts in dynamic environments.
The Traveling Salesman Problem (TSP) is a combinatorial
Reinforcement learning (RL) is an area of machine learning
optimization problem where the goal is to find the shortest
where agents learn to make decisions by interacting with their
possible route that visits each city exactly once and returns to
environment, aiming to maximize cumulative rewards. Focus
the original city. In this script, the TSP is approached using a lies in solving simple decision-making tasks using the epsilon-
neural network paradigm known as the Hopfield network. The greedy algorithm, which balances exploration and exploitation.
script first defines a set of points representing city coordinates
We analyze the MENACE system, an early RL-based game-
and generates permutations of these points. Then, it employs a
playing engine for naughts and crosses, and implement bandit
customized energy calculation function within the framework
problems, a common testbed for studying RL algorithms. In
of a Hopfield network to compute the energy associated with
the binary bandit problem, the agent chooses between two
each permutation, considering distances between cities. actions, each producing rewards that follow a probabilistic
By iteratively updating the network’s state to minimize pattern. We apply the epsilon-greedy algorithm to balance
energy, akin to converging to a stable state in the network’s exploration and exploitation, aiming to optimize the overall
dynamics, the script aims to find the permutation (route) with reward. Furthermore, we tackle a 10-armed bandit problem
the minimum energy, which corresponds to the optimal TSP with non-stationary rewards, where the expected value of each
route. Through this iterative process of minimizing energy, the action shifts over time. This creates the need for adjusting stan-
Hopfield network attempts to solve the TSP by converging dard RL strategies to better track these changes. Through these
towards the optimal solution represented by the permutation implementations, we gain valuable insights into important
with the lowest energy. RL principles, such as managing the exploration-exploitation
balance and adapting to dynamic reward environments.
A. P1: MENACE (Matchbox Educable Naughts and Crosses
Engine)
The above energy function E(x) within the framework of a In MENACE, each game state is represented by a unique
Hopfield network to solve the Traveling Salesman Problem configuration of the tic-tac-toe board, with a matchbox ded-
(TSP). This function penalizes configurations violating TSP icated to each specific state. This matchbox holds the beads
constraints, like revisiting or skipping cities. By iterating that signify the possible actions available to the agent. The
through permutations of cities, the code computes the energy selection of actions is determined by the beads within the
for each permutation, selecting the one with minimum energy matchbox. The number of beads allocated to each action
as the optimal route. reflects its probability of being chosen, a greater bead count
The Hopfield network iteratively adjusts neuron states to corresponds to a higher likelihood of selecting that action.
minimize this energy, converging to a stable minimum corre- After each game, the system updates the matchboxes according
sponding to the optimal route. This iterative process continues to the outcome. This reinforcement process strengthens the
until convergence, gradually exploring permutations. Finally, likelihood of selecting winning actions while decreasing the
the optimal route is extracted from the stable state of the chances of choosing those that led to losses. The implemen-
network, indicating the order to visit cities and minimize total tation also emphasizes several critical aspects. Initially, the
distance traveled. system assigns an initial count of beads for each possible
action across all game states. Following the game, the agent as the reward dynamics continuously shift. Consequently,
updates the bead counts based on the results, ensuring that suc- the algorithm may continue to exploit a machine that was
cessful actions receive reinforcement. Furthermore, MENACE once profitable but is no longer the top performer, resulting
incorporates an exploration-exploitation tradeoff; it allows the in missed opportunities and decreased overall winnings. To
agent to discover new actions by randomly selecting beads illustrate, think of dining at a restaurant with ten different
while simultaneously exploiting previously successful actions dishes, each with varying tastes (the rewards). If the flavors
by reinforcing the beads associated with those actions. of each dish change frequently, the epsilon-greedy algorithm
would be akin to sticking with a dish you enjoyed yesterday,
B. P2: Binary Bandit with Epsilon-Greedy Algorithm even if it’s not the best choice today. This approach causes
The epsilon-greedy algorithm is a fundamental approach for you to overlook the possibility of discovering other dishes
addressing the N-armed bandit problem, where the goal is to that might be more appealing based on the current ”menu.”
select actions that maximize long-term rewards. Picture having
two slot machines (Bandit A and Bandit B) with unknown D. P4: Evaluating a Modified Epsilon-Greedy Algorithm in a
payout probabilities. This algorithm aids in deciding which Non-Stationary 10-Armed Bandit Environment
machine to play to maximize winnings. The challenge is Imagine returning to the casino with ten slot machines,
finding the right balance between exploration—trying out both this time equipped with a special tool—the modified epsilon-
machines—and exploitation—focusing on the machine that greedy algorithm! This enhanced strategy helps you tackle the
appears to yield better results. The epsilon-greedy algorithm fluctuating payout rates (non-stationary rewards) that posed
addresses this balance using an epsilon () parameter. With a challenges for the standard epsilon-greedy algorithm. A crucial
probability of , the agent opts for a random machine, allowing element of this new approach is the forgetting factor (), a
for exploration. In contrast, with a probability of (1-), the agent value that ranges between 0 and 1. You can think of it
chooses the machine that has the highest estimated average as a dial that determines how much importance is given
reward based on previous outcomes. The estimated average to past wins and losses when selecting which machine to
reward serves as a historical record of each machine’s perfor- play next. When the forgetting factor is set high (closer
mance. Initially, both machines are assigned an equal average to 1), the modified algorithm emphasizes recent outcomes.
reward (often set at 0.5, reflecting no prior knowledge). As Similar to how you would favor a machine that just paid out
the agent plays, the algorithm updates these average rewards generously, the algorithm concentrates on actions that have
based on the actual results (1 for a win and 0 for a loss). yielded high rewards recently. This responsiveness enables it
Over time, the machine with more wins will accumulate a to quickly adapt to shifts in the reward environment. On the
higher estimated average reward, increasing its likelihood of other hand, a low (closer to 0) gives greater importance to
being chosen for exploitation in future rounds. By strategically historical wins, akin to the basic algorithm’s approach. The
adjusting the exploration factor () and continuously updating exciting aspect of this modified algorithm is that it updates
the estimated rewards through gameplay, the epsilon-greedy the estimated/calculated value of each machine (or arm) using
algorithm strives to identify the machine that consistently a formula that incorporates the forgetting factor rather than just
performs better, thereby maximizing long-term winnings. averaging the old estimate with the new reward. The formula
is as follows:
C. P3: 10-armed bandit in which all ten mean-rewards start
out equal and then take independent random walks NewCalcVal = α × OldCalcVal + (1 − α) × NewReward
Let us say you’re in a casino with ten slot machines instead
This equation takes into account previous performance while
of just two, and this time, the payout rates for each machine
allowing the new reward to have a stronger influence, depend-
are not fixed—they change over time. This scenario presents
ing on the value of . A higher places more emphasis on
a challenge known as non-stationary rewards. The epsilon-
the new reward, enabling the estimated value to adapt more
greedy algorithm we discussed earlier struggles to adapt to this
rapidly to changes in the actual payout of the machine.
situation. In the previous case, the payout rates were constant,
allowing the algorithm to learn which machine was more VII. W EEK 8 : U NDERSTAND THE PROCESS OF
rewarding based on past experiences. You may have noticed SEQUENTIAL DECISION MAKING ( STOCHASTIC
patterns, such as machine A providing more wins than machine ENVIRONMENT ) AND THE CONNECTION WITH
B. However, in this new context, the situation is quite different. REINFORCEMENT LEARNING
The machine that performs well today may not necessarily
be the best choice tomorrow. For instance, machine A might P ROBLEM FOR I N -L AB D ISCUSSION
be winning today, but tomorrow, machine C could take the Suppose that an agent is situated in a 4 × 3 environment
lead! This is where the standard epsilon-greedy algorithm falls as shown in Figure 1. Beginning in the start state, it must
short. It relies on estimated average rewards derived from past choose an action at each time step. The interaction with the
outcomes. In a stationary environment, these estimated rewards environment terminates when the agent reaches one of the goal
serve as reliable indicators of future performance. However, states, marked +1 or −1. The environment is fully observable,
in a non-stationary setting, past wins lose their significance so the agent always knows where it is.
The agent can take the following actions in each state: • Reward Function: The immediate reward is calculated
Up, Down, Left, and Right. However, the environment is as:
stochastic: the action taken achieves the intended effect with a
probability of 0.8, but the rest of the time the agent moves at R(s, a) = Rental Revenue−Moving Cost−Parking Cost
right angles to the intended direction with equal probabilities. • Discount Factor: γ = 0.9 is used to weigh future
If the agent bumps into a wall, it stays in the same square. rewards.
Rewards:
• Moving to any state (except terminal states): r(s) = Steps in Solving the Problem
−0.04. We solve the problem using the Policy Iteration algorithm.
• Moving to terminal states: r(s) = +1 or −1 respectively. The algorithm alternates between policy evaluation and policy
Task: Use Value Iteration to find the value function corre- improvement.
sponding to the optimal policy. Repeat this for the following 1. Policy Evaluation: In this step, we evaluate the current
reward structures: policy π by solving the Bellman Expectation Equation:
• r(s) = −2
X
V (s) = R(s, π(s)) + γ P (s′ |s, π(s))V (s′ )
• r(s) = 0.1
s′
• r(s) = 0.02
• r(s) = 1 Here, V (s) is the value of state s, R(s, π(s)) is the immediate
reward for taking action π(s) in state s, and P (s′ |s, π(s)) is
P ROBLEM D ESCRIPTION the probability of transitioning to state s′ from s.
The Gbike bicycle rental problem involves managing two 2. Policy Improvement: In this step, we improve the policy
locations where bicycles can be rented. The goal is to max- by choosing actions that maximize the expected return for each
imize daily profit while accounting for costs associated with state:
moving bikes between locations and managing bike requests "
X
#
′ ′ ′
and returns. The problem is modeled as a Markov Decision π (s) = arg max R(s, a) + γ P (s |s, a)V (s )
a
Process (MDP). s′

Assumptions The policy is updated iteratively until it converges.


• Bike rental requests and returns at each location follow Optimizations
a Poisson distribution.
• Truncated Poisson Distribution: To reduce computa-
• There is a maximum of 20 bikes at each location.
tion, the Poisson distribution is truncated at a reasonable
• A maximum of 5 bikes can be moved between locations
threshold (e.g., 2 times the mean).
overnight at a cost of INR 2 per bike.
• Caching Probabilities: Poisson probabilities are precom-
• Revenue from renting a bike is INR 10.
puted and cached for reuse.
• Parking space is limited. If more than 10 bikes are stored
• Efficient Transition Calculations: Only states with non-
overnight at any location, an additional cost of INR 4 is
zero transition probabilities are considered.
incurred.
• An employee at the first location can shuttle one bike to I MPLEMENTATION D ETAILS
the second location for free. The implementation involves the following key steps:
• The discount factor for future rewards is 0.9.
1) Initialize the state space, value function, and policy.
A PPROACH AND L OGIC 2) Define the reward function, including revenue, moving
Formulating the MDP costs, and parking costs.
The Gbike problem is modeled as an MDP with the follow- 3) Precompute Poisson probabilities for bike requests and
ing components: returns.
4) Perform policy iteration:
• States: A state is represented as s = (x, y), where x is
• Evaluate the current policy by iteratively updating
the number of bikes at location 1 and y is the number of
bikes at location 2. the value function.
• Improve the policy by selecting the best action for
• Actions: An action a is the number of bikes moved
between locations overnight. Actions range from −5 to each state based on the updated value function.
+5, where negative values indicate bikes moved from 5) Simulate the optimal policy to verify results.
location 2 to location 1, and positive values indicate bikes In conclusion, the Gbike bicycle rental problem provides an
moved from location 1 to location 2. excellent example of applying MDPs to a real-world decision-
• Transition Probabilities: Transition probabilities are de- making scenario. By formulating the problem as an MDP and
termined by the Poisson distribution of bike requests and solving it using policy iteration, we can derive an optimal
returns. For a given state s and action a, the next state policy that maximizes daily profit while managing constraints
s′ depends on the number of rentals and returns. such as moving costs and parking limitations.
VIII. C ONCLUSION R EFERENCES
[1] Russell, S., & Norvig, P. (2010). Artificial Intelligence: A Modern
In these lab sessions, we delved into various concepts Approach. Prentice Hall.
of artificial intelligence and search algorithms, focusing on [2] Khemani, D. (2019). A First Course in Artificial Intelligence. McGraw
their application in solving complex problems. We explored Hill.
[3] Kirkpatrick, S., Gelatt, C. D., & Vecchi, M. P. (1983). Optimization
state space search problems through different case studies, by Simulated Annealing. Science, 220(4598), 671-680. https://doi.org/10.
including the implementation of heuristic search techniques 1126/science.220.4598.671
and simulated annealing. [4] NumPy Documentation. (n.d.). Retrieved from https://numpy.org/doc/
stable/
During the initial weeks, we successfully tackled problems [5] SciPy Documentation. (n.d.). Retrieved from https://docs.scipy.org/doc/
involving search algorithms, particularly in the context of scipy/reference/
the Marble Solitaire game. By implementing various search [6] Matplotlib Documentation. (n.d.). Retrieved from https://matplotlib.org/
stable/contents.html
strategies such as Uniform Cost Search, Best-First Search, [7] Project Jupyter. (n.d.). Retrieved from https://jupyter.org/
and A* Search, we enhanced our understanding of search [8] Git Documentation. (n.d.). Retrieved from https://git-scm.com/doc
algorithms and their practical applications. [9] GitHub Guides. (n.d.). Retrieved from https://guides.github.com/
[10] Multi-armed bandit
In the latter part of the lab, we shifted our focus to the [11] R. Sutton and A. Barto, Reinforcement Learning: An Introduction, 2nd
Jigsaw puzzle problem, where we applied the simulated an- ed.
[12] https://people.csail.mit.edu/brooks/idocs/matchbox.pdf
nealing algorithm. This experience taught us how to formulate
problems as state space searches and provided insight into the
effectiveness of probabilistic algorithms in exploring complex
solution spaces. The iterative nature of simulated annealing
allowed us to grasp the balance between exploration and
exploitation in search processes.
Over the last three weeks, we covered key AI and Rein-
forcement Learning concepts. Week 6 focused on heuristic
search methods like A* for optimal decision-making. Week 7
introduced stochastic decision-making with Markov Decision
Processes and the epsilon-greedy algorithm for balancing ex-
ploration and exploitation. Week 8 explored dynamic program-
ming techniques such as value iteration and policy iteration for
policy optimization, applying them to real-world problems like
the Gbike system. These weeks provided a strong foundation
in decision-making under uncertainty.

IX. L INK TO THE G IT H UB

The complete code for all the four labs can be found
at the following link: https://github.com/Lalwaniamisha789/
CS-307-Lab-Report.git.

X. S OFTWARE U SED

Throughout the course of these lab assignments, various


software tools and libraries were utilized to implement and
analyze the algorithms developed for solving the jigsaw puz-
zle problem. The primary programming language used was
Python, complemented by the following libraries and tools:
• NumPy: For numerical operations and efficient array
manipulations.
• SciPy: Specifically used for loading the scrambled image
data from MATLAB files.
• Matplotlib: For visualizing the results of the algorithms
through graphs.
• LaTeX: Employed for writing and formatting the final
report.
• Git: Used for version control and maintaining a code
repository.

You might also like