0% found this document useful (0 votes)

13 views14 pages

Learning Task

The document provides an overview of Q-learning, a reinforcement learning approach that estimates optimal state-action pairs without requiring prior knowledge of the reward function or environment dynamics. It discusses key concepts such as hierarchical clustering, distance functions, data standardization, and the importance of exploration in learning. Additionally, it highlights real-world applications of reinforcement learning in various fields, including robotics, business strategy, and traffic control.

Uploaded by

venkatraoboppudi95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views14 pages

Learning Task

Uploaded by

venkatraoboppudi95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Q-LEARNING

CO-3
AIM

To familiarize students with the concepts of unsupervised machine learning, hierarchical clustering,
distance functions, and data standardization

INSTRUCTIONAL OBJECTIVES

This session is designed to:

1. Two formulations for learning: Inductive and Analytical
2. Perfect domain theories

LEARNING OUTCOMES

At the end of this session, you should be able to:

1. Hierarchical clustering and its types
2. Agglomerative clustering
3. Measuring the distance of two clusters
4. Data standardization techniques
Q-Function

• One approach to RL is then to try to estimate V*(s).

• However, this approach requires you to know r(s,a) and delta(s,a).
• This is unrealistic in many real problems. What is the reward if a robot is
exploring mars and decides to take a right turn?
• Fortunately, we can circumvent this problem by exploring and
experiencing how the world reacts to our actions. We need to learn r &
delta.
• We want a function that directly learns good state-action pairs, i.e., what
action should I take in this state. We call this Q(s,a).
• Given Q(s,a) it is now trivial to execute the optimal policy, without
knowing r(s,a) and delta(s,a). We have:
 * (s )  argmax Q (s , a)
a

V * (s ) max Q (s , a )
a
Example II

Check that
 * (s )  argmax Q (s , a)
a

V * (s ) max Q (s , a )
a
Q-Learning

• This still depends on r(s , a) and delta(s , a).

• However, imagine the robot is exploring its environment, trying new actions as it
goes.

• At every step it receives some reward “r”, and it observes the environment change
into a new state s’ for action a.

• How can we use these observations, (s, a, s’,r) to learn a model?

s’=st+1
Q-Learning

• This equation continually estimates Q at state s consistent with an estimate

of Q at state s’, one step in the future: temporal difference (TD) learning.

• Note that s’ is closer to goal, and hence more “reliable”, but still an estimate itself.

• Updating estimates based on other estimates is called bootstrapping.

• We do an update after each state-action pair. I.e., we are learning online!

• We are learning useful things about explored state-action pairs. These are typically most
useful because they are likely to be encountered again.

• Under suitable conditions, these updates can actually be proved to converge to the real
answer.
Example Q-Learning

Qˆ(s1, aright )  r   max Qˆ(s2 , a ' )

 0  0.9 max{66,81,100}
 90
Q-learning propagates Q-estimates 1-step backwards
Exploration / Exploitation

• It is very important that the agent does not simply follow the current
policy when learning Q. (off-policy learning).The reason is that you may
get stuck in a suboptimal solution. I.e., there may be other solutions
out there that you have never seen.

• Hence it is good to try new things so now and then, e.g.

If T large lots of exploring, if T small follow current policy. One can
decrease
T over time ˆ
P (a | s )  eQ (s ,a ) / T
Improvements

• One can trade-off memory and computation by cashing (s,s’,r)

for observed transitions. After a while, as Q(s’,a’) has changed,
you can “replay” the update:

• One can actively search for state-action pairs for which Q(s,a)
is expected to change a lot (prioritized sweeping).

• One can do updates along the sampled path much further back
than just one step ( TD( ) learning).
Extensions

• To deal with stochastic environments, we need to maximize

expected future discounted reward:

• Often the state space is too large to deal with all states. In this case we
need to learn a function: Q (s , a ) f (s , a )


• Neural network with back-propagation have been quite successful.

• For instance, TD-Gammon is a back-gammon program that plays at expert level. state-space very
large, trained by playing against itself, uses NN to approximate value function, uses TD(lambda)
for learning.
More on Function Approximation

• For instance: linear function:

• The features Phi are fixed measurements of the state (e.g., # stones
on the board).
• We only learn the parameters theta.

• Update rule: (start in state s, take action a, observe reward r and end
up in state s’)
Conclusion

• Reinforcement learning addresses a very broad and relevant question:

• How can we learn to survive in our environment?
• We have looked at Q-learning, which simply learn s from experience.
• No model of the world is needed.
• We made simplifying assumptions: e.g., state of the world only
depends on last state and action. This is the Markov assumption. The
model is called a Markov Decision Process (MDP).
• We assumed deterministic dynamics, reward function, but the world
really is stochastic.
• There are many extensions to speed up learning.
• There have been many successful real-world applications.
Applications of
Reinforcement Learning

• Robotics for industrial automation.

• Business strategy planning
• Machine learning and data processing
• It helps you to create training systems that provide
custom instruction and materials according to the
requirement of students.
• Aircraft control and robot motion control
• Traffic Light Control
• A robot cleaning room and recharging its battery
• Robot-soccer
• How to invest in shares
• Modeling the economy through rational agents
• Learning how to fly a helicopter
• Scheduling planes to their destinations
THANK YOU

TEAM ML

Ai (It) Unit-5
No ratings yet
Ai (It) Unit-5
43 pages
CSE 445 - Lecture 9 - Reinforcement Learning
No ratings yet
CSE 445 - Lecture 9 - Reinforcement Learning
45 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
52 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
Q Learing
No ratings yet
Q Learing
30 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
26 pages
Reinforcement Learning II
No ratings yet
Reinforcement Learning II
28 pages
7 - Reinforcement Learning
No ratings yet
7 - Reinforcement Learning
23 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
50 pages
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 11 - Reinforcement Learning II Dr. Shivanjali Khare
52 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
Unit 5 - Reinforcement Learning
No ratings yet
Unit 5 - Reinforcement Learning
15 pages
Reinforcement Learning I
No ratings yet
Reinforcement Learning I
30 pages
Reinforcement Learning - Personal Study Notes
No ratings yet
Reinforcement Learning - Personal Study Notes
12 pages
Unit 5
No ratings yet
Unit 5
54 pages
Unit - 5
No ratings yet
Unit - 5
43 pages
10 Deep Reinforcement
No ratings yet
10 Deep Reinforcement
40 pages
Unit-5 MLT
No ratings yet
Unit-5 MLT
13 pages
MAS Lab7 QFA
No ratings yet
MAS Lab7 QFA
10 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
38 pages
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
No ratings yet
Artificial Intelligence: Lecture 10 - Reinforcement Learning Prof. Shivanjali Khare
45 pages
37 RL
No ratings yet
37 RL
18 pages
S18 Reinforcement Learning 2
No ratings yet
S18 Reinforcement Learning 2
46 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I - Print
25 pages
11-DL-Deep Learning For Reinforcement Learning
No ratings yet
11-DL-Deep Learning For Reinforcement Learning
47 pages
AI Plays Geometry Dash
No ratings yet
AI Plays Geometry Dash
7 pages
Unit 5 ML
No ratings yet
Unit 5 ML
15 pages
13-RL DRL
No ratings yet
13-RL DRL
102 pages
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
No ratings yet
7.reinforcement Learning-Introduction-The Learning Task Q-Learning
34 pages
Lec 09
No ratings yet
Lec 09
26 pages
Intro To Reinforcement Learning
No ratings yet
Intro To Reinforcement Learning
56 pages
L13 Reinforcement Learning
No ratings yet
L13 Reinforcement Learning
35 pages
Unit 5d - Deep Reinforcement Learning
No ratings yet
Unit 5d - Deep Reinforcement Learning
52 pages
ML - Unit 3 - Part II
No ratings yet
ML - Unit 3 - Part II
51 pages
ml4r 2025 05
No ratings yet
ml4r 2025 05
22 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
45 pages
Lecture RL
No ratings yet
Lecture RL
37 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Plant Disease Detection Project Report
100% (1)
Plant Disease Detection Project Report
5 pages
Rule-Based Reinforcement Learning Augmented by External Knowledge
No ratings yet
Rule-Based Reinforcement Learning Augmented by External Knowledge
7 pages
Reinforcement Learning Guide
No ratings yet
Reinforcement Learning Guide
18 pages
Unit 1
No ratings yet
Unit 1
18 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
32 pages
Lecture 29 RL
No ratings yet
Lecture 29 RL
38 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
11 pages
AI Video Captioning for Developers
No ratings yet
AI Video Captioning for Developers
8 pages
Lecture 30 Reinforcement-Learning
No ratings yet
Lecture 30 Reinforcement-Learning
50 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
16 pages
21 - Reinforcement Learning
No ratings yet
21 - Reinforcement Learning
25 pages
Reinforcement Learning Basics
No ratings yet
Reinforcement Learning Basics
14 pages
Fundamentals of Reinforcement Learning
No ratings yet
Fundamentals of Reinforcement Learning
33 pages
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
No ratings yet
Unit-5 Part C 1) Explain The Q Function and Q Learning Algorithm Assuming Deterministic Rewards and Actions With Example. Ans)
11 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
6 pages
Reinforcement Learning: Yijue Hou
No ratings yet
Reinforcement Learning: Yijue Hou
34 pages
I2ml3e Chap18
No ratings yet
I2ml3e Chap18
27 pages
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
No ratings yet
Reinforcement Learning: Csci 5512: Artificial Intelligence Ii
30 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I PDF
38 pages
Advanced Reinforcement Learning
No ratings yet
Advanced Reinforcement Learning
46 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
46 pages
Face Detection
No ratings yet
Face Detection
18 pages
SP14 CS188 Lecture 10 - Reinforcement Learning I
No ratings yet
SP14 CS188 Lecture 10 - Reinforcement Learning I
35 pages
Reinforcement Learning Overview
No ratings yet
Reinforcement Learning Overview
14 pages
AI and Robotics Complete Practice Set Final
No ratings yet
AI and Robotics Complete Practice Set Final
12 pages
Intro to Machine Learning Basics
100% (1)
Intro to Machine Learning Basics
52 pages
Probability Models
No ratings yet
Probability Models
23 pages
AI Phase1
No ratings yet
AI Phase1
7 pages
Performance Measures
No ratings yet
Performance Measures
19 pages
Tree Models
No ratings yet
Tree Models
42 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
35 pages
AI Question Bank All Unit
No ratings yet
AI Question Bank All Unit
6 pages
Machine Learning Definitions & Applications
100% (1)
Machine Learning Definitions & Applications
4 pages
Intro to Artificial Neural Networks
No ratings yet
Intro to Artificial Neural Networks
30 pages
Brain Tumor MRI Detection
No ratings yet
Brain Tumor MRI Detection
39 pages
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
No ratings yet
Lecture Slides For: Ethem Alpaydin © The MIT Press, 2010
28 pages
Project Report Format (IQAC) 2024-25-FINAL
No ratings yet
Project Report Format (IQAC) 2024-25-FINAL
18 pages
Iris Liveness Detection Using Transfer Learning With MobileNets: Strengthening Cybersecurity in Biometric Identification
No ratings yet
Iris Liveness Detection Using Transfer Learning With MobileNets: Strengthening Cybersecurity in Biometric Identification
17 pages
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
No ratings yet
EMCAD Efficient Multi-Scale Convolutional Attention Decoding For Medical Image Segmentation
14 pages
Adam vs. SGD - Closing The Generalization Gap On Image Classification
No ratings yet
Adam vs. SGD - Closing The Generalization Gap On Image Classification
7 pages
Chapter 11 Introduction To Machine Learning
No ratings yet
Chapter 11 Introduction To Machine Learning
11 pages
Artificial Intelligence in 5G
No ratings yet
Artificial Intelligence in 5G
34 pages
BLEED AI Outline Classical Vision
No ratings yet
BLEED AI Outline Classical Vision
26 pages
Decision Trees in Python Guide
No ratings yet
Decision Trees in Python Guide
29 pages
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
No ratings yet
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
14 pages
Marin Token Pooling in Vision Transformers For Image Classification WACV 2023 Paper
No ratings yet
Marin Token Pooling in Vision Transformers For Image Classification WACV 2023 Paper
10 pages
Final Unit 3 Questions
No ratings yet
Final Unit 3 Questions
9 pages
Alzheimer's Imaging with WGAN
No ratings yet
Alzheimer's Imaging with WGAN
13 pages
Deep Learning for Flotation Purity Prediction
No ratings yet
Deep Learning for Flotation Purity Prediction
12 pages
Aday Mühendislik Rapor 20050111045 ERENCANYAKUT
No ratings yet
Aday Mühendislik Rapor 20050111045 ERENCANYAKUT
2 pages
SwinLSTM: Spatiotemporal Prediction Boost
No ratings yet
SwinLSTM: Spatiotemporal Prediction Boost
10 pages
Multi-Style Transfer for Images
No ratings yet
Multi-Style Transfer for Images
8 pages
A Brief Review of Feed-Forward Neural Networks
No ratings yet
A Brief Review of Feed-Forward Neural Networks
8 pages
Yash King Ai
No ratings yet
Yash King Ai
2 pages
Rice Image Classification Guide
No ratings yet
Rice Image Classification Guide
2 pages

Learning Task

Uploaded by

Learning Task

Uploaded by

Q-LEARNING

This session is designed to:

At the end of this session, you should be able to:

• One approach to RL is then to try to estimate V*(s).

• This still depends on r(s , a) and delta(s , a).

• How can we use these observations, (s, a, s’,r) to learn a model?

• This equation continually estimates Q at state s consistent with an estimate

• Updating estimates based on other estimates is called bootstrapping.

• We do an update after each state-action pair. I.e., we are learning online!

Qˆ(s1, aright )  r   max Qˆ(s2 , a ' )

• Hence it is good to try new things so now and then, e.g.

• One can trade-off memory and computation by cashing (s,s’,r)

• To deal with stochastic environments, we need to maximize

• Neural network with back-propagation have been quite successful.

• For instance: linear function:

• Reinforcement learning addresses a very broad and relevant question:

• Robotics for industrial automation.

You might also like