AI-832 Reinforcement Learning
Instructor: Dr. Zuhair Zafar
Lecture # 19: Off-Policy Learning, Importance Sampling
Recap
• What is GLIE?
• What is SARSA Algorithm for on-policy control?
• Difference between SARSA and TD Learning algorithm?
• What is n-step SARSA?
• What is Forward view of SARSA(𝜆)?
• What is Backward view of SARSA(𝜆)?
SARSA(𝝀) Gridworld Example
Today’s Agenda
• Off-Policy Learning
• Importance Sampling
Off-Policy Learning
Today’s Agenda
• Off-Policy Learning
• Importance Sampling
Importance Sampling
Importance sampling is a generalized technique to estimate expected values under one
distribution given samples from another.
We apply importance sampling to off-policy learning by weighting returns according to
the relative probability of their trajectories occurring under the target and behavior
policies, called the importance-sampling ratio.
Importance Sampling for Off-Policy Monte Carlo
Importance Sampling for Off-Policy TD