Ex.
No 4 Q-LEARNING
Date
Aim:
To find the optimal path in an environment using Q-learning algorithm.
Procedure:
1. Initialize Q-learning parameters.
2. Create and initialize the Q-table.
3. Define state transition matrix and rewards.
4. Implement Q-learning with epsilon-greedy policy.
5. Test the learned policy by printing the Q-table and finding the optimalpath.
Program:
import numpy as np
# Define the Q-learning parameters
num states =6
num actions = 2
learning_ rate =0.l
discount factor = 0.9
num_episodes = 1000
# Initialize the Q-table with zeros
Q=np.zeros((num_states, num_actions))
# Define the state transition matrix and rewards
T= np.array([[-1, 2]. [0, 3]. [1, 4]. [2, 5], (3, -1), (4, 5]])
rewards = np.array([[-Il, -1], (-1, -1], [-1, -1], [-1, -1], (-1, 100]. [10, 100]1)
#Q-learning algorithm
for episode in range(num _episodes):
state = 0 # Initial state is 0
while state != 5: #Continue until reaching the goal (state 5)
# Choose an action epsilon-greedily (exploration vs. exploitation)
epsilon =0.1
if np.random.rand() < epsilon:
action = np.random.choice(num actions)
else:
action = np.argmax(Q[state, :])
# Take the chosen action and observe the next state and reward
new_state = T[state, action]
reward = rewards[state, action]
# Update the Q-value using the Q-learning update rule
Q[state, action] += learning rate * (reward + discount_factor * np.max(Q[new_state, :])
-Q[state, action])
# Move to the next state
state = new state
# Print the learned Q-table
print("Learned Q-table:")
print(Q)
# Test the learned policy
state = 0
path = [state]
while state != 5:
action = np.argmax(Q(state, :])
new state = T[state, action]
path.append(new_state)
state F new state
print("Optimal path:", path)
Output:
Learned Q-table:
[[ 8.9895291 5e+02 6.65250406e+02] [
3.9471955 l e+02
9.61253887e+00] [
1.07808528e+02 8.48844034e+02][
1.99000000e-01
3.95632623e+01
3.22192736e+02] [
9.99481917e+02] [
8.96535389e+02 9.9995371 7e+02]]
Optimal path: [0, -1, 5]
Result:
To find the optimal path in an environment using Q-learming algorithm.