Foundations • Part 1 of 2
Editor Reviewed

The RL Landscape

Model-free vs model-based, value vs policy methods

The Algorithm Zoo

RL algorithms come in many flavors. Here’s a map to help you navigate.

Major algorithm families
📊
Value-Based

Learn how good states/actions are, derive policy from values

• Q-Learning
• DQN
• SARSA
🎯
Policy-Based

Learn the policy directly without value functions

• REINFORCE
• PPO
• TRPO
🎭
Actor-Critic

Learn both value and policy together

• A2C / A3C
• SAC
• TD3

Model-Free vs. Model-Based

Model-Free

Learn directly from experience without modeling the environment

Simpler, more flexible
Needs more samples
Model-Based

Learn a model of the environment, use it for planning

More sample efficient
Model errors can compound

On-Policy vs. Off-Policy

On-Policy

Learn from actions the agent is currently taking. Must collect new data as policy changes.

Examples: SARSA, PPO
Off-Policy

Learn from actions taken by any policy. Can reuse old data.

Examples: Q-Learning, DQN, SAC
💡Don't Memorize This

You don’t need to understand these categories now. We’ll build up to each one through the book. This map is here so you know where we’re going.

ℹ️Next Up

Head to Try It Yourself to see an RL agent in action with an interactive demo.