Chapters | rlbook.ai

Foundations

Start here

Core concepts: agents, environments, and the exploration-exploitation tradeoff.

What is Reinforcement Learning?

The big picture: what RL is, where it came from, and where you see it today

✅

The RL Framework

The building blocks: agents, environments, states, actions, rewards, and policies

✅

Getting Started

The algorithm landscape, your roadmap, and your first hands-on demo

✅

Markov Decision Processes

The mathematical framework for sequential decision making.

Introduction to MDPs

The mathematical framework that formalizes sequential decision-making

📝

Value Functions

Measuring how good states and actions are

📝

The Bellman Equations

The recursive equations that make RL possible

📝

Dynamic Programming

Optimal solutions when the environment model is known.

Policy Evaluation

Computing the value of a policy when you know the model

📝

Policy Improvement

Finding better policies through value functions

📝

Bandit Problems

Simple decision problems that isolate the exploration-exploitation tradeoff.

Multi-Armed Bandits

Master the exploration-exploitation tradeoff in the simplest RL setting

📝

Contextual Bandits

Personalized decisions based on context features

📝

Temporal Difference Learning

Learning value functions from experience without a model.

Introduction to TD Learning

Learning from experience without waiting for the episode to end

📝

SARSA

On-policy TD control: learning while following your current policy

📝

Q-Learning

Off-policy TD control: learning the optimal policy while exploring

📝

Deep Reinforcement Learning

Scaling RL with neural networks: from DQN to modern architectures.

Function Approximation

Scaling RL to large state spaces with learned representations

📝

Deep Q-Networks

The breakthrough that made deep RL work

📝

DQN Improvements

Enhancements that make DQN even better

📝

Policy Gradient Methods

Learn policies directly with gradient ascent. From REINFORCE to PPO.

Introduction to Policy Gradients

A fundamentally different approach: learning policies directly

📝

REINFORCE

The foundational policy gradient algorithm

📝

Actor-Critic Methods

Combining policy and value learning for stability

📝

Proximal Policy Optimization

The most popular deep RL algorithm in practice

📝

Advanced Topics

Model-based RL, multi-agent systems, offline RL, and RLHF.

Model-Based RL

Learning world models for sample-efficient planning

📝

Multi-Agent RL

When multiple agents learn and interact together

📝

Offline RL

Learning from logged data without environment interaction

📝

RL for Language Models

From RLHF to reasoning: how RL transforms language models

📝

ML Concepts

Machine learning fundamentals used throughout RL.

Quantization

Reducing model size and speeding up inference by using lower-precision numbers

✅

Content Status

📝 AI Generated — Pending review

✅ Editor Reviewed — Approved by editor

👥 Community Reviewed — Incorporates feedback

🔒 Verified — Code tested, demos working