Getting Started

What You'll Learn

Survey the landscape of RL algorithms: value-based, policy-based, and actor-critic
Understand the difference between model-free and model-based methods
Know the roadmap for the rest of this book
Experience the RL loop hands-on with an interactive demo

Chapter Overview

You’ve learned what RL is and the framework behind it. Now it’s time to see the bigger picture: the different families of algorithms you’ll learn, and where they fit in the landscape. Then, you’ll get hands-on experience with your first interactive demo.

In this chapter:

The RL Landscape

Algorithm families: value-based, policy-based, model-free, model-based

Try It Yourself

Experience the RL loop with an interactive GridWorld demo

Your Learning Journey

Part 1: Foundations

You are here! Understanding the basics of RL.

Part 2: Bandit Problems

The simplest RL setting: choosing between options with uncertain rewards.

Part 3: Q-Learning Foundations

Value-based methods: TD learning, Q-learning, and Deep Q-Networks.

Part 4: Policy Gradient Methods

Policy-based methods: REINFORCE, actor-critic, and PPO.

Each part builds on the previous. By the end, you’ll understand the algorithms behind modern AI systems like ChatGPT and AlphaGo.

💡Ready to Get Hands-On?

After surveying the landscape, head to Try It Yourself for an interactive demo where you’ll see an RL agent navigate a GridWorld.

Foundations Summary

Key Takeaways

Reinforcement learning is learning from interaction: take actions, observe rewards, improve behavior
The RL loop: Agent observes state → chooses action → receives reward → new state → repeat
Unlike supervised learning, RL learns from delayed, sparse, evaluative feedback, not correct labels
The goal is to maximize cumulative reward over time, not just immediate reward
The exploration-exploitation tradeoff is fundamental: exploit what you know vs. explore to learn more
Policies tell agents what to do; value functions tell agents how good states are
RL powers game-playing AI, robotics, recommendations, LLMs, and more

Exercises

Conceptual Questions

1. List three everyday examples of reinforcement learning

Human or animal, not computer. For each, identify the agent, environment, states, actions, and rewards.

2. Why can’t we use supervised learning for game playing?

What specific challenges does RL address that supervised learning cannot?

3. What’s the risk if an agent only exploits and never explores?

Give a concrete example where this would lead to poor performance.

Think About It

4. Frame a problem you face as RL

Think of a repeated decision you make (commute route, workout routine, study schedule). What would be the state, actions, and rewards? What makes it hard to optimize?

5. When is RL overkill?

Some problems don’t need RL—supervised learning or even simple rules work fine. What characteristics of a problem make RL the right choice?

Next ChapterMulti-Armed Bandits→