Foundations
Start hereCore concepts: agents, environments, and the exploration-exploitation tradeoff.
Introduction to Reinforcement Learning
Understand the core concepts of RL: agents, environments, rewards, and the learning loop
Multi-Armed Bandits
Master the exploration-exploitation tradeoff in the simplest RL setting
Contextual Bandits
Learn to make personalized decisions based on context features
Q-Learning Foundations
Value-based methods from TD learning through deep Q-networks.
Introduction to TD Learning
Learn how TD methods combine the best of Monte Carlo and Dynamic Programming
Q-Learning Basics
Master the foundational algorithm for learning optimal behavior
Exploration vs Exploitation
Balance discovery with optimization using proven strategies
Deep Q-Networks
Scale Q-learning with neural networks, experience replay, and target networks
Q-Learning Applications
Apply Q-learning to real-world problems in games, robotics, and finance
Q-Learning Frontiers
Explore the limits of Q-learning and preview what comes next
Policy Gradient Methods
Learn policies directly with gradient ascent. From REINFORCE to PPO.
Introduction to Policy-Based Methods
Discover a fundamentally different approach: learning policies directly instead of value functions
The Policy Gradient Theorem and REINFORCE
Master the fundamental theorem that enables learning policies through gradient ascent
Actor-Critic Methods
Combine the best of policy gradients and value-based learning for stable, efficient training
PPO and Trust Region Methods
Master the most popular deep RL algorithm and understand why it works
Policy Gradient Methods in Practice
Apply policy gradient methods to real-world challenges in robotics, RLHF, and beyond