Function Approximation in Reinforcement Learning
What You'll Learn
- Explain why tabular methods fail in large or continuous state spaces
- Describe the function approximation approach to RL
- Implement linear function approximation for value estimation
- Understand the deadly triad and its implications
- Explain how neural networks enable deep RL
Our Q-learning agent mastered a 4x4 grid. But what about a robot navigating a room? With continuous position (x, y) and orientation, there are infinite states. We can’t have a table entry for every possible configuration.
We need a way to generalize.
Why Function Approximation?
In tabular RL, we stored a value for every state (or state-action pair). This works for small, discrete problems. But real-world problems often have:
- Continuous states: Position, velocity, angles
- High-dimensional observations: Images with millions of pixels
- Combinatorially large spaces: Chess has more positions than atoms in the universe
Function approximation lets us represent value functions compactly and generalize across similar states.
Chapter Overview
This chapter bridges tabular RL and deep RL, introducing the core ideas that make modern RL algorithms work:
Why Tables Fail
The curse of dimensionality in RL
Linear Approximation
Features, weights, and gradient descent
Neural Networks
Deep learning meets reinforcement learning
The Core Idea
Instead of storing for every state-action pair, we learn parameters such that . Similar states automatically get similar values.
The key insight is that we can use any function approximator (linear models, neural networks, decision trees) to represent our value function. The choice of approximator determines:
- What patterns can be captured
- How efficiently we learn
- Whether training is stable
Prerequisites
This chapter assumes familiarity with:
- Q-Learning for the core algorithm we’re extending
- Basic calculus (gradients and optimization)
- (Recommended) Bellman Equations for the theoretical foundation
Key Questions We’ll Answer
- Why can’t we just discretize continuous states?
- How do we update parameters instead of table entries?
- What is the “deadly triad” and why should we care?
- How do neural networks unlock deep RL?
Key Takeaways
- Generalization is the key benefit: learn from some states, apply to similar ones
- Linear approximation with features is simple but powerful
- The deadly triad (function approximation + bootstrapping + off-policy) can cause divergence
- Neural networks can learn their own features, enabling end-to-end learning