Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views9 pages

Lecture 19 - Model-Free Control, Off-Policy Learning

The lecture discusses off-policy learning and importance sampling in reinforcement learning. It covers key concepts such as GLIE, SARSA algorithms, and the differences between on-policy and off-policy methods. Importance sampling is highlighted as a technique to estimate expected values using samples from different distributions, particularly in the context of off-policy learning.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views9 pages

Lecture 19 - Model-Free Control, Off-Policy Learning

The lecture discusses off-policy learning and importance sampling in reinforcement learning. It covers key concepts such as GLIE, SARSA algorithms, and the differences between on-policy and off-policy methods. Importance sampling is highlighted as a technique to estimate expected values using samples from different distributions, particularly in the context of off-policy learning.

Uploaded by

Hadia Ramzan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

AI-832 Reinforcement Learning

Instructor: Dr. Zuhair Zafar

Lecture # 19: Off-Policy Learning, Importance Sampling


Recap

• What is GLIE?

• What is SARSA Algorithm for on-policy control?

• Difference between SARSA and TD Learning algorithm?

• What is n-step SARSA?

• What is Forward view of SARSA(𝜆)?

• What is Backward view of SARSA(𝜆)?


SARSA(𝝀) Gridworld Example
Today’s Agenda

• Off-Policy Learning

• Importance Sampling
Off-Policy Learning
Today’s Agenda

• Off-Policy Learning

• Importance Sampling
Importance Sampling

Importance sampling is a generalized technique to estimate expected values under one


distribution given samples from another.
We apply importance sampling to off-policy learning by weighting returns according to
the relative probability of their trajectories occurring under the target and behavior
policies, called the importance-sampling ratio.
Importance Sampling for Off-Policy Monte Carlo
Importance Sampling for Off-Policy TD

You might also like