Dhanalakshmi Srinivasan University
Tiruchirappalli - 621112
Course Course Name L T P C
Code
24AML506 Reinforcement Learning
3 0 0 3
Pre- Fundamentals of Machine Learning, Probability & Linear Syllabus Version
requisite Algebra V1.0
Course Objectives
To introduce the fundamental concepts of reinforcement learning and its differences from supervised and unsupervised learning.
To understand the mathematical foundations of decision-making under uncertainty.
To explore model-free and model-based reinforcement learning methods.
To implement RL algorithms in real-world applications such as robotics, games, and recommendation systems.
To provide exposure to advanced RL concepts including policy gradient methods and deep reinforcement learning.
Course Outcomes
Understand the basics of reinforcement learning and Markov Decision Processes (MDPs).
Apply model-free prediction and control methods for sequential decision-making.
Implement dynamic programming, Monte Carlo, and temporal-difference learning algorithms.
Compare different RL methods and evaluate their performance.
Apply advanced RL techniques to solve real-world problems using function approximation and deep learning.
Unit 1 Introduction to Reinforcement Learning 9 Hours
Introduction to Reinforcement Learning – Difference between RL, Supervised and Unsupervised Learning - Elements of RL: Agent, Environment, Rewards, States,
Actions, Policy - Exploration vs Exploitation trade-off - Applications of Reinforcement Learning in AI, Robotics, and Games.
Unit 2 Markov Decision Processes (MDPs) 9 Hours
Introduction to Sequential Decision Problems-Markov Property and Markov Chains-Markov Decision Process: States, Actions, Rewards, Transition Probabilities -
Value Functions – State Value Function, Action Value Function - Bellman Equations.
Unit 3 Dynamic Programming and Monte Carlo Methods 9 Hours
Policy Evaluation, Policy Iteration, Value Iteration - Limitations of Dynamic Programming - Monte Carlo Prediction and Control - On-policy and Off-policy Learning.
Unit 4 Temporal-Difference Learning 9 Hours
Temporal-Difference (TD) Prediction - SARSA (State-Action-Reward-State-Action) Algorithm - Q-Learning Algorithm - Eligibility Traces and n-step TD methods -
Comparison of MC, DP, and TD Methods.
Unit 5 Advancements in Reinforcement Learning 9 Hours
Function Approximation in RL - Policy Gradient Methods – REINFORCE, Actor-Critic Methods - Deep Reinforcement Learning – Deep Q-Networks (DQN), Policy
Gradient with Neural Networks - Applications in Robotics, Finance, Healthcare, and Autonomous Systems - Challenges and Future Trends in Reinforcement Learning.
Total 45 hours
Text Books:
1. Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, 2nd Edition, MIT Press, 2018.
2. Csaba Szepesvari, Algorithms for Reinforcement Learning, Morgan & Claypool, 2010.
Reference Books:
1. Marco Wiering and Martijn van Otterlo (Eds.), Reinforcement Learning: State of the Art, Springer, 2012.
2. Maxim Lapan, Deep Reinforcement Learning Hands-On, Packt Publishing, 2nd Edition, 2020.
3. Ian Goodfellow, Yoshua Bengio, Aaron Courville, Deep Learning, MIT Press, 2016 (for background on neural networks).