Chapter 121
📝Draft

Function Approximation

Scaling RL to large state spaces with learned representations

Prerequisites:

Function Approximation in Reinforcement Learning

What You'll Learn

  • Explain why tabular methods fail in large or continuous state spaces
  • Describe the function approximation approach to RL
  • Implement linear function approximation for value estimation
  • Understand the deadly triad and its implications
  • Explain how neural networks enable deep RL

Our Q-learning agent mastered a 4x4 grid. But what about a robot navigating a room? With continuous position (x, y) and orientation, there are infinite states. We can’t have a table entry for every possible configuration.

We need a way to generalize.

Why Function Approximation?

In tabular RL, we stored a value for every state (or state-action pair). This works for small, discrete problems. But real-world problems often have:

  • Continuous states: Position, velocity, angles
  • High-dimensional observations: Images with millions of pixels
  • Combinatorially large spaces: Chess has more positions than atoms in the universe

Function approximation lets us represent value functions compactly and generalize across similar states.

Chapter Overview

This chapter bridges tabular RL and deep RL, introducing the core ideas that make modern RL algorithms work:

The Core Idea

📖Function Approximation

Instead of storing Q(s,a)Q(s,a) for every state-action pair, we learn parameters w\mathbf{w} such that Q^(s,a;w)Q(s,a)\hat{Q}(s,a;\mathbf{w}) \approx Q^*(s,a). Similar states automatically get similar values.

The key insight is that we can use any function approximator (linear models, neural networks, decision trees) to represent our value function. The choice of approximator determines:

  • What patterns can be captured
  • How efficiently we learn
  • Whether training is stable

Prerequisites

This chapter assumes familiarity with:

  • Q-Learning for the core algorithm we’re extending
  • Basic calculus (gradients and optimization)
  • (Recommended) Bellman Equations for the theoretical foundation

Key Questions We’ll Answer

  • Why can’t we just discretize continuous states?
  • How do we update parameters instead of table entries?
  • What is the “deadly triad” and why should we care?
  • How do neural networks unlock deep RL?

Key Takeaways

  • Generalization is the key benefit: learn from some states, apply to similar ones
  • Linear approximation with features is simple but powerful
  • The deadly triad (function approximation + bootstrapping + off-policy) can cause divergence
  • Neural networks can learn their own features, enabling end-to-end learning
Next ChapterDeep Q-Networks