Deep Q-Networks (DQN)
What You'll Learn
- Explain why naive Q-learning with neural networks fails
- Describe how experience replay breaks correlation
- Explain target networks and why they stabilize training
- Implement DQN from scratch
- Understand frame stacking and preprocessing for visual inputs
In 2013, a paper from DeepMind shook the AI world: a single algorithm, with the same hyperparameters, learned to play 49 different Atari games from raw pixels, some at superhuman level.
That algorithm was DQN, and it showed that deep learning and reinforcement learning could work together.
The Breakthrough
A Q-learning agent that uses a deep neural network to approximate the Q-function, stabilized by two key innovations: experience replay and target networks.
The deadly triad (off-policy + function approximation + bootstrapping) seemed fatal. DQN survives by breaking two correlations:
- Experience replay breaks the correlation between consecutive samples
- Target networks break the correlation between Q-values and their targets
Chapter Overview
This chapter covers the complete DQN algorithm, piece by piece:
The DQN Architecture
CNNs for processing visual observations
Experience Replay
Breaking correlations through random sampling
Target Networks
Stabilizing training with frozen targets
Putting It Together
The complete DQN algorithm
The Core Idea
DQN is fundamentally just Q-learning with a neural network:
But instead of a table, we have:
- A neural network parameterized by weights
- A replay buffer storing past experiences
- A target network with frozen weights
The DQN loss function:
where is the replay buffer and are the target network parameters.
Prerequisites
This chapter builds on:
- Q-Learning for the core algorithm
- Function Approximation for why we need neural networks and the challenges they introduce
Key Questions We’ll Answer
- Why does naive neural network Q-learning fail?
- How does storing and replaying experiences help?
- Why do we need a separate target network?
- What preprocessing is needed for visual inputs?
Key Takeaways
- DQN = Q-learning + neural network + two key tricks
- Experience replay decorrelates training samples and improves data efficiency
- Target networks provide stable targets during learning
- Frame stacking provides temporal information from static images
- DQN achieved superhuman performance on many Atari games with a single algorithm