Volcano Crossing: Value & Policy Iteration

Assignments

Uploaded by

Ruqyya waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views6 pages

Volcano Crossing: Value & Policy Iteration

Assignments

Uploaded by

Ruqyya waheed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

PROJECT

SUBMITTED BY:
ALI HUZAIFA (04072212012)
MUHAMMAD FAIZAN RABBANI (04072212012)
SUBMITTED TO:
DR.AYYAZ HUSSAIN
DATE:10-06-2024
VOLCANO CROSSING PROBLEM

1. Introduction:
The Volcano Crossing problem is a classic example in reinforcement learning,
where an agent navigates a grid-world environment containing hazardous areas
(volcanoes) and rewarding areas (exit). The objective is to find an optimal policy
for the agent to safely reach the exit while avoiding the volcanoes. In this report,
we explore the implementation and results of two solution methods: Value
Iteration and Policy Iteration.

2. Problem Description:
The Volcano Crossing environment consists of a grid-world with specified
dimensions and locations for the exit and volcanoes. The agent can move in four
directions: up, down, left, and right. Upon reaching the exit, the agent receives a
high reward, while entering a volcano results in a penalty. The goal is to
maximize cumulative rewards while reaching the exit from a starting position.

3. Value Iteration:
Value Iteration is an iterative algorithm used to compute the optimal value
function for each state in the grid-world. The process involves iteratively
updating the value function until convergence. Key components of Value
Iteration include:

 One Step Lookahead: Evaluating the expected future rewards for each action
from a given state.
 Value Function Update: Updating the value function based on the maximum
expected cumulative rewards.
 Convergence: Terminating the algorithm when the value function stabilizes.

4. Policy Iteration:
Policy Iteration is an alternative approach that iteratively improves an initial
policy until convergence to the optimal policy. It consists of two main steps:
policy evaluation and policy improvement. The steps involved in Policy Iteration
are:

 Policy Evaluation: Iteratively evaluating the value function for a given policy.
 Policy Improvement: Updating the policy based on the current value function
to maximize rewards.
 Convergence: Repeating policy evaluation and improvement until
convergence to an optimal policy.

5. Implementation:
The provided Python code implements both Value Iteration and Policy Iteration
algorithms to solve the Volcano Crossing problem. It includes functions for one-
step lookahead, value iteration, policy iteration, plotting the value function, and
the main function for execution.

\
6. OUTPUT:

Final Value Function after Value Iteration:

[[ 0.54557601 1.14858108 2.41806543 5.09066406 10.7171875 ]
[ 1.14858108 2.41806543 5.09066406 10.7171875 22.5625 ]
[ 2.41806543 5.09066406 10.7171875 22.5625 47.5 ]
[ 5.09066406 10.7171875 22.5625 47.5 100. ]
[ 10.7171875 22.5625 47.5 100. 0. ]]
Optimal Policy:
[[1 1 1 1 1]
[1 1 3 1 1]
[1 1 1 1 1]
[1 1 1 1 1]
[3 3 3 3 0]]

7. Results:
After running both algorithms on the Volcano Crossing problem, we obtained
the following results:

 Final Value Function: The optimal value function obtained after convergence
of Value Iteration.
 Optimal Policy: The policy obtained after convergence of Policy Iteration,
indicating the best action to take in each state.

8. Conclusion:
In conclusion, both Value Iteration and Policy Iteration offer effective solutions to
the Volcano Crossing problem, enabling the agent to navigate the environment
safely while maximizing cumulative rewards. These algorithms serve as
foundational techniques in reinforcement learning and decision-making
problems, providing insights into optimal policy selection and value function
estimation. Further experimentation and analysis can enhance understanding and
application in various domains.

01 Module 1 Early Reinforcement Learning
No ratings yet
01 Module 1 Early Reinforcement Learning
134 pages
Policy Optimization For Continuous Reinforcement Learning: Hanyang Zhao Wenpin Tang David D. Yao October 19, 2023
No ratings yet
Policy Optimization For Continuous Reinforcement Learning: Hanyang Zhao Wenpin Tang David D. Yao October 19, 2023
31 pages
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
No ratings yet
RL 10 QUESTIONS FOR MID II Scheme of Evaluvation
15 pages
Rl-Unit Iv Qa
No ratings yet
Rl-Unit Iv Qa
16 pages
Abdolmaleki Et Al. - 2018 - Maximum A Posteriori Policy Optimisation
No ratings yet
Abdolmaleki Et Al. - 2018 - Maximum A Posteriori Policy Optimisation
23 pages
Hierarchical RL for Efficient Learning
No ratings yet
Hierarchical RL for Efficient Learning
7 pages
Powell UnifiedFrameworkStochasticOptimization Jan292018
No ratings yet
Powell UnifiedFrameworkStochasticOptimization Jan292018
69 pages
PU学习法：Off-policy evaluation via off-policy classification
No ratings yet
PU学习法：Off-policy evaluation via off-policy classification
12 pages
Exploring Reinforcement Learning Algorithms: Information and Communication Technologies Department
No ratings yet
Exploring Reinforcement Learning Algorithms: Information and Communication Technologies Department
60 pages
Experiment 3
No ratings yet
Experiment 3
6 pages
Experimen 5
No ratings yet
Experimen 5
5 pages
Trust Region Policy Optimization: John Schulman Sergey Levine Philipp Moritz Michael Jordan Pieter Abbeel
No ratings yet
Trust Region Policy Optimization: John Schulman Sergey Levine Philipp Moritz Michael Jordan Pieter Abbeel
16 pages
RL 20241103355 Report
No ratings yet
RL 20241103355 Report
4 pages
RL Unit 4
No ratings yet
RL Unit 4
18 pages
0192 LEAF Latent Exploration Along The Frontier
No ratings yet
0192 LEAF Latent Exploration Along The Frontier
8 pages
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
No ratings yet
4 Reinforcement Learning - Basic Algorithms: - S, A) ) and The Immediate Reward Function R (R (S, A, S
16 pages
Wa 2
No ratings yet
Wa 2
6 pages
Garcia Martinez The Firefighter Problem:empirical Results On Random Graphs
No ratings yet
Garcia Martinez The Firefighter Problem:empirical Results On Random Graphs
12 pages
SLchapt 3
No ratings yet
SLchapt 3
10 pages
ACC23 Tutorial Paulson
No ratings yet
ACC23 Tutorial Paulson
12 pages
RL for Combinatorial Optimization
No ratings yet
RL for Combinatorial Optimization
24 pages
Assignment 2
No ratings yet
Assignment 2
13 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
93 pages
MarkovDecisionProcesses Analysis
No ratings yet
MarkovDecisionProcesses Analysis
10 pages
Lecture 3 - CS50's Introduction To Artificial Intelligence With Python
No ratings yet
Lecture 3 - CS50's Introduction To Artificial Intelligence With Python
17 pages
Sample Mid Term ACI
No ratings yet
Sample Mid Term ACI
3 pages
Hoeffding and Bernstein Races For Selecting Policies in Evolutionary Direct Policy Search
No ratings yet
Hoeffding and Bernstein Races For Selecting Policies in Evolutionary Direct Policy Search
9 pages
Solution MT QP AI 24-25 OddSem
No ratings yet
Solution MT QP AI 24-25 OddSem
4 pages
Assignment 2 - Policy Gradients
No ratings yet
Assignment 2 - Policy Gradients
7 pages
2404 08003v5
No ratings yet
2404 08003v5
31 pages
Recitation 13 Slides - DP
No ratings yet
Recitation 13 Slides - DP
16 pages
Unit7 RL
No ratings yet
Unit7 RL
7 pages
MDP Solution Methods: Iteration & LP
No ratings yet
MDP Solution Methods: Iteration & LP
34 pages
3 - Chapter 4 Value Iteration and Policy Iteration
No ratings yet
3 - Chapter 4 Value Iteration and Policy Iteration
20 pages
RL Notes
No ratings yet
RL Notes
69 pages
Journal Pre-Proofs: Expert Systems With Applications
No ratings yet
Journal Pre-Proofs: Expert Systems With Applications
73 pages
3 - Chapter 4 Value Iteration and Policy Iteration
No ratings yet
3 - Chapter 4 Value Iteration and Policy Iteration
20 pages
On The Convergence of Projected Policy Gradient For Any Constant Step Sizes
No ratings yet
On The Convergence of Projected Policy Gradient For Any Constant Step Sizes
35 pages
Experiment 4
No ratings yet
Experiment 4
7 pages
5 - MDP
No ratings yet
5 - MDP
42 pages
RL Unit-4
No ratings yet
RL Unit-4
18 pages
RL Lecture4
No ratings yet
RL Lecture4
7 pages
Forecasting Volcanic Eruption and Impact Assessment
No ratings yet
Forecasting Volcanic Eruption and Impact Assessment
5 pages
RL Assgn1
No ratings yet
RL Assgn1
14 pages
AI Magazine - 2024 - Hanna - Toward The Confident Deployment of Real World Reinforcement Learning Agents
No ratings yet
AI Magazine - 2024 - Hanna - Toward The Confident Deployment of Real World Reinforcement Learning Agents
8 pages
Quasi Newton Trpo
No ratings yet
Quasi Newton Trpo
10 pages
基于图的分层策略
No ratings yet
基于图的分层策略
11 pages
CS 307 Lab 08
No ratings yet
CS 307 Lab 08
4 pages
Assignment #7
No ratings yet
Assignment #7
3 pages
Mathematical Foundations of Reinforcement Learning
No ratings yet
Mathematical Foundations of Reinforcement Learning
283 pages
Chapter 02
No ratings yet
Chapter 02
53 pages
RL Chap 4
No ratings yet
RL Chap 4
7 pages
Lecture#3 Bellmann Equation and Dynamic Programming DP 2024 Part
No ratings yet
Lecture#3 Bellmann Equation and Dynamic Programming DP 2024 Part
33 pages
Home Work of Reinforcement Learning Policy Based Theory
No ratings yet
Home Work of Reinforcement Learning Policy Based Theory
10 pages
Doubly Robust Off-Policy RL
No ratings yet
Doubly Robust Off-Policy RL
14 pages
Policy Gradient Methods
No ratings yet
Policy Gradient Methods
70 pages
Advanced Topics in Control Systems: Exercises and Project Ideas
No ratings yet
Advanced Topics in Control Systems: Exercises and Project Ideas
12 pages
Self-Reflection On Instructional Coaching (1) 2
No ratings yet
Self-Reflection On Instructional Coaching (1) 2
3 pages
Bengtech Metallurgy Extended
100% (1)
Bengtech Metallurgy Extended
2 pages
The GMP Regulations Report 2020
No ratings yet
The GMP Regulations Report 2020
5 pages
OD429516601930181100
No ratings yet
OD429516601930181100
1 page
PE
No ratings yet
PE
552 pages
Factors and Norms Influencing Unpaid Care Work
No ratings yet
Factors and Norms Influencing Unpaid Care Work
64 pages
Insurance Premium Rates Guide
No ratings yet
Insurance Premium Rates Guide
6 pages
### The Opium Trade Between The British East India Company and China
No ratings yet
### The Opium Trade Between The British East India Company and China
2 pages
Tencents Wide Moat - Saber Capital
No ratings yet
Tencents Wide Moat - Saber Capital
55 pages
Shares and Dividends
53% (17)
Shares and Dividends
4 pages
Business Plan: "A Lamp That Will Make Your Future Brighter
No ratings yet
Business Plan: "A Lamp That Will Make Your Future Brighter
19 pages
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
No ratings yet
Dr. Naheed Zamani Clinic Lahore - Top Doctors, Fees, Contact Number
1 page
Computer Engineering Technician - Sample Resume
No ratings yet
Computer Engineering Technician - Sample Resume
2 pages
Bookkeeping (Second Part)
100% (3)
Bookkeeping (Second Part)
38 pages
Cambridge International AS & A Level: PHYSICS 9702/34
No ratings yet
Cambridge International AS & A Level: PHYSICS 9702/34
12 pages
Compression: DMET501 - Introduction To Media Engineering
No ratings yet
Compression: DMET501 - Introduction To Media Engineering
26 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
23 pages
PP Riseofchina
No ratings yet
PP Riseofchina
16 pages
Lec 1 Cost Est
No ratings yet
Lec 1 Cost Est
42 pages
Hygromatik Electrode Steam Humidifiers EU 2011
No ratings yet
Hygromatik Electrode Steam Humidifiers EU 2011
6 pages
O Level English Project
100% (1)
O Level English Project
3 pages
Pari 1
No ratings yet
Pari 1
35 pages
Export Import and Countertrade
No ratings yet
Export Import and Countertrade
32 pages
Guidelines For Writing Thesis SCF 2023-2024
No ratings yet
Guidelines For Writing Thesis SCF 2023-2024
5 pages
1.rakitanprinter 20 Januari 2020-1 1
No ratings yet
1.rakitanprinter 20 Januari 2020-1 1
1 page
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
No ratings yet
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
3 pages
NOC Check List DCA
No ratings yet
NOC Check List DCA
8 pages
Thesis Paper On Net Zero Carbon
No ratings yet
Thesis Paper On Net Zero Carbon
68 pages
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
No ratings yet
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
24 pages
Woodhouse: Midgley Gardens
No ratings yet
Woodhouse: Midgley Gardens
36 pages

Volcano Crossing: Value & Policy Iteration

Uploaded by

Volcano Crossing: Value & Policy Iteration

Uploaded by

PROJECT

Final Value Function after Value Iteration:

You might also like