The Algorithm Zoo
RL algorithms come in many flavors. Here’s a map to help you navigate.
Learn how good states/actions are, derive policy from values
Learn the policy directly without value functions
Learn both value and policy together
Model-Free vs. Model-Based
Learn directly from experience without modeling the environment
Learn a model of the environment, use it for planning
On-Policy vs. Off-Policy
Learn from actions the agent is currently taking. Must collect new data as policy changes.
Learn from actions taken by any policy. Can reuse old data.
You don’t need to understand these categories now. We’ll build up to each one through the book. This map is here so you know where we’re going.
Head to Try It Yourself to see an RL agent in action with an interactive demo.