Function Approximation in Reinforcement Learning

What You'll Learn

Explain why tabular methods fail in large or continuous state spaces
Describe the function approximation approach to RL
Implement linear function approximation for value estimation
Understand the deadly triad and its implications
Explain how neural networks enable deep RL

Our Q-learning agent mastered a 4x4 grid. But what about a robot navigating a room? With continuous position (x, y) and orientation, there are infinite states. We can’t have a table entry for every possible configuration.

We need a way to generalize.

Why Function Approximation?

In tabular RL, we stored a value for every state (or state-action pair). This works for small, discrete problems. But real-world problems often have:

Continuous states: Position, velocity, angles
High-dimensional observations: Images with millions of pixels
Combinatorially large spaces: Chess has more positions than atoms in the universe

Function approximation lets us represent value functions compactly and generalize across similar states.

Chapter Overview

This chapter bridges tabular RL and deep RL, introducing the core ideas that make modern RL algorithms work:

Why Tables Fail

The curse of dimensionality in RL

Linear Approximation

Features, weights, and gradient descent

Neural Networks

Deep learning meets reinforcement learning

The Core Idea

📖Function Approximation

Instead of storing $Q(s,a)$ for every state-action pair, we learn parameters $\mathbf{w}$ such that $\hat{Q}(s,a;\mathbf{w}) \approx Q^*(s,a)$ . Similar states automatically get similar values.

The key insight is that we can use any function approximator (linear models, neural networks, decision trees) to represent our value function. The choice of approximator determines:

What patterns can be captured
How efficiently we learn
Whether training is stable

Prerequisites

This chapter assumes familiarity with:

Q-Learning for the core algorithm we’re extending
Basic calculus (gradients and optimization)
(Recommended) Bellman Equations for the theoretical foundation

Key Questions We’ll Answer

Why can’t we just discretize continuous states?
How do we update parameters instead of table entries?
What is the “deadly triad” and why should we care?
How do neural networks unlock deep RL?

Key Takeaways

Generalization is the key benefit: learn from some states, apply to similar ones
Linear approximation with features is simple but powerful
The deadly triad (function approximation + bootstrapping + off-policy) can cause divergence
Neural networks can learn their own features, enabling end-to-end learning

Next ChapterDeep Q-Networks→