The RL Landscape | Getting Started

The Algorithm Zoo

RL algorithms come in many flavors. Here’s a map to help you navigate.

Major algorithm families

📊

Value-Based

Learn how good states/actions are, derive policy from values

• Q-Learning

• DQN

• SARSA

🎯

Policy-Based

Learn the policy directly without value functions

• REINFORCE

• PPO

• TRPO

🎭

Actor-Critic

Learn both value and policy together

• A2C / A3C

• SAC

• TD3

Model-Free

Learn directly from experience without modeling the environment

✓Simpler, more flexible

✗Needs more samples

Model-Based

Learn a model of the environment, use it for planning

✓More sample efficient

✗Model errors can compound

On-Policy

Learn from actions the agent is currently taking. Must collect new data as policy changes.

Examples: SARSA, PPO

Off-Policy

Learn from actions taken by any policy. Can reuse old data.

Examples: Q-Learning, DQN, SAC

💡Don't Memorize This

You don’t need to understand these categories now. We’ll build up to each one through the book. This map is here so you know where we’re going.

ℹ️Next Up

Head to Try It Yourself to see an RL agent in action with an interactive demo.