0% found this document useful (0 votes)

132 views9 pages

Q Learning

The document describes how Q-learning can be used to train a robot to navigate a maze and reach the goal point without stepping on mines. The robot's position in the maze is modeled as states in a Q-table, while possible movements are actions. Initially random, the robot learns over time to take advantageous actions by updating the Q-table values using rewards and penalties. Through iterative training, the robot exploits what it has learned to find the shortest path to the goal.

Uploaded by

Surya Mudaliar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

132 views9 pages

Q Learning

Uploaded by

Surya Mudaliar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Q-Learning — a simplistic overview

Let’s say that a robot has to cross a maze and reach the end

point. There are mines, and the robot can only move one tile
at a time. If the robot steps onto a mine, the robot is dead.
The robot has to reach the end point in the shortest time
possible.
The scoring/reward system is as below:

1. The robot loses 1 point at each step. This is done so that the
robot takes the shortest path and reaches the goal as fast as
possible.
2. If the robot steps on a mine, the point loss is 100 and the game
ends.
3. If the robot gets power ⚡️, it gains 1 point.
4. If the robot reaches the end goal, the robot gets 100 points.
Now, the obvious question is: How do we train a robot to
reach the end goal with the shortest path without stepping on a
mine?

So, how do we solve this?

Introducing the Q-Table

Q-Table is just a fancy name for a simple lookup table where
we calculate the maximum expected future rewards for action
at each state. Basically, this table will guide us to the best
action at each state.

There will be four numbers of actions at each non-edge tile.

When a robot is at a state it can either move up or down or
right or left.

So, let’s model this environment in our Q-Table.

In the Q-Table, the columns are the actions and the rows are
the states.
Each Q-table score will be the maximum expected future
reward that the robot will get if it takes that action at that
state. This is an iterative process, as we need to improve the
Q-Table at each iteration.

But the questions are:

 How do we calculate the values of the Q-table?

 Are the values available or predefined?
To learn each value of the Q-table, we use the Q-Learning
algorithm.
Mathematics: the Q-Learning algorithm
Q-function
The Q-function uses the Bellman equation and takes two
inputs: state (s) and action (a).

Using the above function, we get the values of Q for the cells
in the table.
When we start, all the values in the Q-table are zeros.

There is an iterative process of updating the values. As we

start to explore the environment, the Q-function gives us
better and better approximations by continuously updating the
Q-values in the table.
Now, let’s understand how the updating takes place.

Introducing the Q-learning algorithm process

Each of the colored boxes is one step. Let’s understand each
of these steps in detail.

Step 1: initialize the Q-Table

We will first build a Q-table. There are n columns, where n=
number of actions. There are m rows, where m= number of
states. We will initialise the values at 0.
In our robot example, we have four actions (a=4) and five
states (s=5). So we will build a table with four columns and
five rows.

Steps 2 and 3: choose and perform an action

This combination of steps is done for an undefined amount of
time. This means that this step runs until the time we stop the
training, or the training loop stops as defined in the code.

We will choose an action (a) in the state (s) based on the Q-

Table. But, as mentioned earlier, when the episode initially
starts, every Q-value is 0.
So now the concept of exploration and exploitation trade-off
comes into play. This article has more details.
We’ll use something called the epsilon greedy strategy.
In the beginning, the epsilon rates will be higher. The robot
will explore the environment and randomly choose actions.
The logic behind this is that the robot does not know anything
about the environment.

As the robot explores the environment, the epsilon rate

decreases and the robot starts to exploit the environment.

During the process of exploration, the robot progressively

becomes more confident in estimating the Q-values.

For the robot example, there are four actions to choose from:
up, down, left, and right. We are starting the training now —
our robot knows nothing about the environment. So the robot
chooses a random action, say right.

We can now update the Q-values for being at the start and
moving right using the Bellman equation.

Steps 4 and 5: evaluate

Now we have taken an action and observed an outcome and
reward.We need to update the function Q(s,a).
In the case of the robot game, to reiterate the scoring/reward
structure is:

 power = +1
 mine = -100
 end = +100

We will repeat this again and again until the learning is

stopped. In this way the Q-Table will be updated.
Python implementation of Q-Learning
The concept and code implementation are explained in my
video.
Subscribe to my YouTube channel For more AI
videos : ADL .
At last…let us recap
 Q-Learning is a value-based reinforcement learning algorithm
which is used to find the optimal action-selection policy using
a Q function.
 Our goal is to maximize the value function Q.
 The Q table helps us to find the best action for each state.
 It helps to maximize the expected reward by selecting the best
of all possible actions.
 Q(state, action) returns the expected future reward of that
action at that state.
 This function can be estimated using Q-Learning, which
iteratively updates Q(s,a) using the Bellman equation.
 Initially we explore the environment and update the Q-Table.
When the Q-Table is ready, the agent will start to exploit the
environment and start taking better actions.
SARSA is an on-policy algorithm where, in the current state, S
an action, A is taken and the agent gets a reward, R and ends up
in next state, S1 and takes action, A1 in S1. Therefore, the tuple
(S, A, R, S1, A1) stands for the acronym SARSA.

It is called an on-policy algorithm because it updates the policy

based on actions taken.

SARSA vs Q-learning
The difference between these two algorithms is
that SARSA chooses an action following the same current
policy and updates its Q-values whereas Q-learning chooses
the greedy action, that is, the action that gives the maximum Q-
value for the state, that is, it follows an optimal policy.

Basically, the Q-value is updated taking into account the action,

A1 performed in the state, S1 in SARSA as opposed to Q-
learning where the action with the highest Q-value in the next
state, S1 is used to update Q-table.

Principles of Building AI Agents 2nd Edition
100% (3)
Principles of Building AI Agents 2nd Edition
149 pages
Salesforce Data Cloud Exam Prep
88% (8)
Salesforce Data Cloud Exam Prep
61 pages
The Ultimate n8n Starter Kit (2025)
100% (9)
The Ultimate n8n Starter Kit (2025)
36 pages
Salesforce-AI-Associate 1 (1) 3 3-2
100% (5)
Salesforce-AI-Associate 1 (1) 3 3-2
33 pages
AI Agents by Google
100% (10)
AI Agents by Google
42 pages
Final - AI Specialist Set 1 - & - 2
100% (5)
Final - AI Specialist Set 1 - & - 2
53 pages
Salesforce AI Specalist
No ratings yet
Salesforce AI Specalist
75 pages
Salesforce AI Associate Dumps
100% (4)
Salesforce AI Associate Dumps
60 pages
AI Associate Study Guide
No ratings yet
AI Associate Study Guide
9 pages
Databricks Big Book of GenAI FINAL
100% (7)
Databricks Big Book of GenAI FINAL
118 pages
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
100% (10)
Prompt Engineering Bible Join and Master The AI Revolution Profit Online With GPT-4 Plugins For Effortless Money Making (Robert E. Miller) (Z-Library)
209 pages
Applied Generative AI For Beginners Practical Knowledge 1703207445
94% (16)
Applied Generative AI For Beginners Practical Knowledge 1703207445
221 pages
AI Associate 3
100% (2)
AI Associate 3
41 pages
Generative AI Interview Questions
100% (3)
Generative AI Interview Questions
12 pages
LLMs Guide for Developers & Data Scientists
100% (14)
LLMs Guide for Developers & Data Scientists
132 pages
AI Specialist Set 2 - Updated Answer
100% (5)
AI Specialist Set 2 - Updated Answer
23 pages
Agency Accelerator Week
No ratings yet
Agency Accelerator Week
1 page
Generative AI for Business Leaders
100% (17)
Generative AI for Business Leaders
80 pages
Sample Questions For Salesforce Certified Ai Specialist Exam by Mathews
100% (2)
Sample Questions For Salesforce Certified Ai Specialist Exam by Mathews
18 pages
Salesforce AI Specialist
No ratings yet
Salesforce AI Specialist
7 pages
Gen AI Companies 1679276337830
100% (1)
Gen AI Companies 1679276337830
1 page
Generative AI Agents Explained
100% (3)
Generative AI Agents Explained
42 pages
Guide To Building AI Agents From Scratch
100% (6)
Guide To Building AI Agents From Scratch
17 pages
Test Ict450
100% (1)
Test Ict450
11 pages
Azure Databricks Course Slide Deck
75% (4)
Azure Databricks Course Slide Deck
169 pages
Generative AI Usecases - A Comprehensive Guide - Dummies
100% (1)
Generative AI Usecases - A Comprehensive Guide - Dummies
19 pages
Generative AI With Large Language Models
100% (4)
Generative AI With Large Language Models
31 pages
Top 100 Applications of Generative AI 1683282083
100% (20)
Top 100 Applications of Generative AI 1683282083
119 pages
Q Learning
No ratings yet
Q Learning
6 pages
Unit 5
No ratings yet
Unit 5
65 pages
Q-Learning Algorithm
No ratings yet
Q-Learning Algorithm
13 pages
Intro To Reinforcement Learning - DQ Q AC A3C
No ratings yet
Intro To Reinforcement Learning - DQ Q AC A3C
36 pages
p1 Piotr
No ratings yet
p1 Piotr
7 pages
Report p1
No ratings yet
Report p1
7 pages
Q Learning SARSA Deep Q Learning
No ratings yet
Q Learning SARSA Deep Q Learning
4 pages
AI Seminar RL
No ratings yet
AI Seminar RL
27 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
No ratings yet
Reinforcement Learning Algorithms in Global Path Planning For Mobile Robot
5 pages
Q Learning
No ratings yet
Q Learning
12 pages
39-Q Learning Numerical
No ratings yet
39-Q Learning Numerical
13 pages
Unit 5
No ratings yet
Unit 5
54 pages
Q Learning Ejemplo
100% (1)
Q Learning Ejemplo
11 pages
Q-Learning for Optimal Pathfinding
No ratings yet
Q-Learning for Optimal Pathfinding
2 pages
Deep Learning Binoy-19-3-RL Q Learning
No ratings yet
Deep Learning Binoy-19-3-RL Q Learning
26 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
12 pages
Exp1 D16AD 60
No ratings yet
Exp1 D16AD 60
11 pages
Lec 09
No ratings yet
Lec 09
26 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
112 Q Learning N
100% (1)
112 Q Learning N
15 pages
Reinforcement Learning - Ipynb - Colaboratory
No ratings yet
Reinforcement Learning - Ipynb - Colaboratory
7 pages
Q-Learning in RL With Openai Gym: Joo Soon Lee
No ratings yet
Q-Learning in RL With Openai Gym: Joo Soon Lee
34 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
ml4r 2025 05
No ratings yet
ml4r 2025 05
22 pages
RL Class Mtech
No ratings yet
RL Class Mtech
67 pages
Nidhish RLAI-Lab1
No ratings yet
Nidhish RLAI-Lab1
18 pages
A Painless Q-Learning Tutorial
No ratings yet
A Painless Q-Learning Tutorial
6 pages
Q Learing
No ratings yet
Q Learing
30 pages
Ex No4rl
No ratings yet
Ex No4rl
3 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Q Learning
No ratings yet
Q Learning
38 pages
SARSA, Expected SARSA, Q-Learning
No ratings yet
SARSA, Expected SARSA, Q-Learning
4 pages
MDPs Solving
No ratings yet
MDPs Solving
19 pages
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
No ratings yet
DD2431 Machine Learning Lab 4: Reinforcement Learning Python Version
9 pages
EE 675 Lecture 27th March
No ratings yet
EE 675 Lecture 27th March
4 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
New CZ3005 Module 5 - Reinforcement Learning
No ratings yet
New CZ3005 Module 5 - Reinforcement Learning
31 pages
Filippov Theory in ϵ-Greedy Q-Learning
No ratings yet
Filippov Theory in ϵ-Greedy Q-Learning
66 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
ML Unit 5 (ChatGPT)
No ratings yet
ML Unit 5 (ChatGPT)
17 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
No ratings yet
Simulation of The Navigation of A Mobile Robot by The Q-Learning Using Artificial Neuron Networks
12 pages
Q Learning
No ratings yet
Q Learning
38 pages
Unit 1
No ratings yet
Unit 1
18 pages
Lec 22
No ratings yet
Lec 22
22 pages
CZ3005 Module 5 - Reinforcement Learning
No ratings yet
CZ3005 Module 5 - Reinforcement Learning
31 pages
37 RL
No ratings yet
37 RL
18 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
14 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
14 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
No ratings yet
Q-Learning: Reinforcement Learning Basic Q-Learning Algorithm Common Modifications
22 pages
Agentforce Specialist
67% (3)
Agentforce Specialist
30 pages
LLM Application Through Production
100% (11)
LLM Application Through Production
254 pages
Salesforce Certified Data Cloud Consultant Go4braindumps Actual Questions by Moon 24 05 2024 10qa
No ratings yet
Salesforce Certified Data Cloud Consultant Go4braindumps Actual Questions by Moon 24 05 2024 10qa
20 pages
AI Specialist Sure Shot
100% (1)
AI Specialist Sure Shot
26 pages
Quasi-Anechoic Measurement of Loudspeakers Using Beamforming Method
No ratings yet
Quasi-Anechoic Measurement of Loudspeakers Using Beamforming Method
7 pages
Economics Thesis Blue Variant
No ratings yet
Economics Thesis Blue Variant
38 pages
Haier: Service Manual
No ratings yet
Haier: Service Manual
31 pages
Difference Between QPSK, OQPSK
No ratings yet
Difference Between QPSK, OQPSK
2 pages
2024 - 10 - 14 - ASEAN ITU GovStack - Brunei Country Update FINAL
No ratings yet
2024 - 10 - 14 - ASEAN ITU GovStack - Brunei Country Update FINAL
16 pages
c7 PDF
No ratings yet
c7 PDF
34 pages
OUC DC 911 Follow Up
No ratings yet
OUC DC 911 Follow Up
2 pages
TCS NQT Prep Guide
No ratings yet
TCS NQT Prep Guide
156 pages
3HAC16591 en
No ratings yet
3HAC16591 en
234 pages
Rohini 59125306424
No ratings yet
Rohini 59125306424
5 pages
Building Services for B.Tech Students
No ratings yet
Building Services for B.Tech Students
12 pages
Towards Large-Scale Small Object Detection: Survey and Benchmarks
No ratings yet
Towards Large-Scale Small Object Detection: Survey and Benchmarks
24 pages
Boq1 Replacing Ac at Central Pharmacy Fo
No ratings yet
Boq1 Replacing Ac at Central Pharmacy Fo
11 pages
Digital Tech for Experts
No ratings yet
Digital Tech for Experts
8 pages
Intake and Exhaust: Group 15
No ratings yet
Intake and Exhaust: Group 15
20 pages
AI Exam for B.Tech Students
No ratings yet
AI Exam for B.Tech Students
2 pages
Grounding Installation Guide
100% (1)
Grounding Installation Guide
25 pages
W22 Three-Phase Motor Features
No ratings yet
W22 Three-Phase Motor Features
28 pages
YB4408 Manual de Partes PDF
100% (1)
YB4408 Manual de Partes PDF
533 pages
Airtech Busch Parts
No ratings yet
Airtech Busch Parts
7 pages
M. Ed #RD Teacher Education - I
No ratings yet
M. Ed #RD Teacher Education - I
78 pages
Computerized Enrollment System For Mary
No ratings yet
Computerized Enrollment System For Mary
30 pages
CHANDRA DZDA STAT6174037 ProbabilityTheoryandAppliedStatistics
No ratings yet
CHANDRA DZDA STAT6174037 ProbabilityTheoryandAppliedStatistics
17 pages
Empirical Study On Terminal Water Velocity of Drainage Stack - C.L. Cheng, K.C. He e C.L
No ratings yet
Empirical Study On Terminal Water Velocity of Drainage Stack - C.L. Cheng, K.C. He e C.L
15 pages
SEO Directory and Bookmarking List
No ratings yet
SEO Directory and Bookmarking List
6 pages
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
No ratings yet
N - Channel Enhancement Mode " Single Feature Size " Power Mosfet
9 pages
Lithium Battery Specs & Data
No ratings yet
Lithium Battery Specs & Data
1 page
弗兰德减速机
No ratings yet
弗兰德减速机
5 pages

Q Learning

Uploaded by

Q Learning

Uploaded by

Q-Learning — a simplistic overview

Let’s say that a robot has to cross a maze and reach the end

So, how do we solve this?

Introducing the Q-Table

There will be four numbers of actions at each non-edge tile.

So, let’s model this environment in our Q-Table.

But the questions are:

 How do we calculate the values of the Q-table?

There is an iterative process of updating the values. As we

Introducing the Q-learning algorithm process

Step 1: initialize the Q-Table

Steps 2 and 3: choose and perform an action

We will choose an action (a) in the state (s) based on the Q-

As the robot explores the environment, the epsilon rate

During the process of exploration, the robot progressively

Steps 4 and 5: evaluate

We will repeat this again and again until the learning is

It is called an on-policy algorithm because it updates the policy

Basically, the Q-value is updated taking into account the action,

You might also like