Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views3 pages

Ex No4rl

The document outlines a procedure for implementing the Q-learning algorithm to find the optimal path in a given environment. It includes steps for initializing parameters, creating a Q-table, defining state transitions and rewards, and testing the learned policy. The program demonstrates the Q-learning process with sample code and outputs the learned Q-table and the optimal path.

Uploaded by

ffidlogajaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views3 pages

Ex No4rl

The document outlines a procedure for implementing the Q-learning algorithm to find the optimal path in a given environment. It includes steps for initializing parameters, creating a Q-table, defining state transitions and rewards, and testing the learned policy. The program demonstrates the Q-learning process with sample code and outputs the learned Q-table and the optimal path.

Uploaded by

ffidlogajaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Ex.

No 4 Q-LEARNING
Date

Aim:

To find the optimal path in an environment using Q-learning algorithm.

Procedure:
1. Initialize Q-learning parameters.
2. Create and initialize the Q-table.
3. Define state transition matrix and rewards.

4. Implement Q-learning with epsilon-greedy policy.


5. Test the learned policy by printing the Q-table and finding the optimalpath.

Program:
import numpy as np
# Define the Q-learning parameters
num states =6

num actions = 2

learning_ rate =0.l


discount factor = 0.9
num_episodes = 1000
# Initialize the Q-table with zeros
Q=np.zeros((num_states, num_actions))
# Define the state transition matrix and rewards

T= np.array([[-1, 2]. [0, 3]. [1, 4]. [2, 5], (3, -1), (4, 5]])
rewards = np.array([[-Il, -1], (-1, -1], [-1, -1], [-1, -1], (-1, 100]. [10, 100]1)
#Q-learning algorithm
for episode in range(num _episodes):
state = 0 # Initial state is 0

while state != 5: #Continue until reaching the goal (state 5)


# Choose an action epsilon-greedily (exploration vs. exploitation)
epsilon =0.1
if np.random.rand() < epsilon:
action = np.random.choice(num actions)
else:

action = np.argmax(Q[state, :])


# Take the chosen action and observe the next state and reward

new_state = T[state, action]


reward = rewards[state, action]
# Update the Q-value using the Q-learning update rule
Q[state, action] += learning rate * (reward + discount_factor * np.max(Q[new_state, :])
-Q[state, action])
# Move to the next state

state = new state

# Print the learned Q-table


print("Learned Q-table:")
print(Q)

# Test the learned policy


state = 0

path = [state]
while state != 5:

action = np.argmax(Q(state, :])


new state = T[state, action]
path.append(new_state)
state F new state

print("Optimal path:", path)


Output:
Learned Q-table:
[[ 8.9895291 5e+02 6.65250406e+02] [
3.9471955 l e+02

9.61253887e+00] [
1.07808528e+02 8.48844034e+02][
1.99000000e-01

3.95632623e+01

3.22192736e+02] [
9.99481917e+02] [
8.96535389e+02 9.9995371 7e+02]]
Optimal path: [0, -1, 5]

Result:

To find the optimal path in an environment using Q-learming algorithm.

You might also like