Chapters

Progressive lessons that build from foundations to advanced topics. Each chapter includes intuition, math, code, and exercises.

Foundations

Start here

Core concepts: agents, environments, and the exploration-exploitation tradeoff.

Markov Decision Processes

The mathematical framework for sequential decision making.

Dynamic Programming

Optimal solutions when the environment model is known.

Bandit Problems

Simple decision problems that isolate the exploration-exploitation tradeoff.

Temporal Difference Learning

Learning value functions from experience without a model.

Deep Reinforcement Learning

Scaling RL with neural networks: from DQN to modern architectures.

Policy Gradient Methods

Learn policies directly with gradient ascent. From REINFORCE to PPO.

Advanced Topics

Model-based RL, multi-agent systems, offline RL, and RLHF.

ML Concepts

Machine learning fundamentals used throughout RL.

Content Status

📝 AI Generated — Pending review
Editor Reviewed — Approved by editor
👥 Community Reviewed — Incorporates feedback
🔒 Verified — Code tested, demos working