Chapter 1.2
📝Draft

Getting Started

The algorithm landscape, your roadmap, and your first hands-on demo

Getting Started

What You'll Learn

  • Survey the landscape of RL algorithms: value-based, policy-based, and actor-critic
  • Understand the difference between model-free and model-based methods
  • Know the roadmap for the rest of this book
  • Experience the RL loop hands-on with an interactive demo

Chapter Overview

You’ve learned what RL is and the framework behind it. Now it’s time to see the bigger picture: the different families of algorithms you’ll learn, and where they fit in the landscape. Then, you’ll get hands-on experience with your first interactive demo.

Your Learning Journey

Part 1: Foundations
You are here! Understanding the basics of RL.
Part 2: Bandit Problems
The simplest RL setting: choosing between options with uncertain rewards.
Part 3: Q-Learning Foundations
Value-based methods: TD learning, Q-learning, and Deep Q-Networks.
Part 4: Policy Gradient Methods
Policy-based methods: REINFORCE, actor-critic, and PPO.

Each part builds on the previous. By the end, you’ll understand the algorithms behind modern AI systems like ChatGPT and AlphaGo.

💡Ready to Get Hands-On?

After surveying the landscape, head to Try It Yourself for an interactive demo where you’ll see an RL agent navigate a GridWorld.

Foundations Summary

Key Takeaways

  • Reinforcement learning is learning from interaction: take actions, observe rewards, improve behavior
  • The RL loop: Agent observes state → chooses action → receives reward → new state → repeat
  • Unlike supervised learning, RL learns from delayed, sparse, evaluative feedback, not correct labels
  • The goal is to maximize cumulative reward over time, not just immediate reward
  • The exploration-exploitation tradeoff is fundamental: exploit what you know vs. explore to learn more
  • Policies tell agents what to do; value functions tell agents how good states are
  • RL powers game-playing AI, robotics, recommendations, LLMs, and more

Exercises

Conceptual Questions

1. List three everyday examples of reinforcement learning
Human or animal, not computer. For each, identify the agent, environment, states, actions, and rewards.
2. Why can’t we use supervised learning for game playing?
What specific challenges does RL address that supervised learning cannot?
3. What’s the risk if an agent only exploits and never explores?
Give a concrete example where this would lead to poor performance.

Think About It

4. Frame a problem you face as RL
Think of a repeated decision you make (commute route, workout routine, study schedule). What would be the state, actions, and rewards? What makes it hard to optimize?
5. When is RL overkill?
Some problems don’t need RL—supervised learning or even simple rules work fine. What characteristics of a problem make RL the right choice?
Next ChapterMulti-Armed Bandits