PROJECT
SUBMITTED BY:
ALI HUZAIFA (04072212012)
MUHAMMAD FAIZAN RABBANI (04072212012)
SUBMITTED TO:
DR.AYYAZ HUSSAIN
DATE:10-06-2024
VOLCANO CROSSING PROBLEM
1. Introduction:
The Volcano Crossing problem is a classic example in reinforcement learning,
where an agent navigates a grid-world environment containing hazardous areas
(volcanoes) and rewarding areas (exit). The objective is to find an optimal policy
for the agent to safely reach the exit while avoiding the volcanoes. In this report,
we explore the implementation and results of two solution methods: Value
Iteration and Policy Iteration.
2. Problem Description:
The Volcano Crossing environment consists of a grid-world with specified
dimensions and locations for the exit and volcanoes. The agent can move in four
directions: up, down, left, and right. Upon reaching the exit, the agent receives a
high reward, while entering a volcano results in a penalty. The goal is to
maximize cumulative rewards while reaching the exit from a starting position.
3. Value Iteration:
Value Iteration is an iterative algorithm used to compute the optimal value
function for each state in the grid-world. The process involves iteratively
updating the value function until convergence. Key components of Value
Iteration include:
One Step Lookahead: Evaluating the expected future rewards for each action
from a given state.
Value Function Update: Updating the value function based on the maximum
expected cumulative rewards.
Convergence: Terminating the algorithm when the value function stabilizes.
4. Policy Iteration:
Policy Iteration is an alternative approach that iteratively improves an initial
policy until convergence to the optimal policy. It consists of two main steps:
policy evaluation and policy improvement. The steps involved in Policy Iteration
are:
Policy Evaluation: Iteratively evaluating the value function for a given policy.
Policy Improvement: Updating the policy based on the current value function
to maximize rewards.
Convergence: Repeating policy evaluation and improvement until
convergence to an optimal policy.
5. Implementation:
The provided Python code implements both Value Iteration and Policy Iteration
algorithms to solve the Volcano Crossing problem. It includes functions for one-
step lookahead, value iteration, policy iteration, plotting the value function, and
the main function for execution.
\
6. OUTPUT:
Final Value Function after Value Iteration:
[[ 0.54557601 1.14858108 2.41806543 5.09066406 10.7171875 ]
[ 1.14858108 2.41806543 5.09066406 10.7171875 22.5625 ]
[ 2.41806543 5.09066406 10.7171875 22.5625 47.5 ]
[ 5.09066406 10.7171875 22.5625 47.5 100. ]
[ 10.7171875 22.5625 47.5 100. 0. ]]
Optimal Policy:
[[1 1 1 1 1]
[1 1 3 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
[3 3 3 3 0]]
7. Results:
After running both algorithms on the Volcano Crossing problem, we obtained
the following results:
Final Value Function: The optimal value function obtained after convergence
of Value Iteration.
Optimal Policy: The policy obtained after convergence of Policy Iteration,
indicating the best action to take in each state.
8. Conclusion:
In conclusion, both Value Iteration and Policy Iteration offer effective solutions to
the Volcano Crossing problem, enabling the agent to navigate the environment
safely while maximizing cumulative rewards. These algorithms serve as
foundational techniques in reinforcement learning and decision-making
problems, providing insights into optimal policy selection and value function
estimation. Further experimentation and analysis can enhance understanding and
application in various domains.