Foundations
Start hereCore concepts: agents, environments, and the exploration-exploitation tradeoff.
What is Reinforcement Learning?
The big picture: what RL is, where it came from, and where you see it today
The RL Framework
The building blocks: agents, environments, states, actions, rewards, and policies
Getting Started
The algorithm landscape, your roadmap, and your first hands-on demo
Markov Decision Processes
The mathematical framework for sequential decision making.
Dynamic Programming
Optimal solutions when the environment model is known.
Bandit Problems
Simple decision problems that isolate the exploration-exploitation tradeoff.
Temporal Difference Learning
Learning value functions from experience without a model.
Deep Reinforcement Learning
Scaling RL with neural networks: from DQN to modern architectures.
Policy Gradient Methods
Learn policies directly with gradient ascent. From REINFORCE to PPO.
Introduction to Policy Gradients
A fundamentally different approach: learning policies directly
REINFORCE
The foundational policy gradient algorithm
Actor-Critic Methods
Combining policy and value learning for stability
Proximal Policy Optimization
The most popular deep RL algorithm in practice
Advanced Topics
Model-based RL, multi-agent systems, offline RL, and RLHF.
Model-Based RL
Learning world models for sample-efficient planning
Multi-Agent RL
When multiple agents learn and interact together
Offline RL
Learning from logged data without environment interaction
RL for Language Models
From RLHF to reasoning: how RL transforms language models
ML Concepts
Machine learning fundamentals used throughout RL.