Chapter 122
📝Draft

Deep Q-Networks

The breakthrough that made deep RL work

Deep Q-Networks (DQN)

What You'll Learn

  • Explain why naive Q-learning with neural networks fails
  • Describe how experience replay breaks correlation
  • Explain target networks and why they stabilize training
  • Implement DQN from scratch
  • Understand frame stacking and preprocessing for visual inputs

In 2013, a paper from DeepMind shook the AI world: a single algorithm, with the same hyperparameters, learned to play 49 different Atari games from raw pixels, some at superhuman level.

That algorithm was DQN, and it showed that deep learning and reinforcement learning could work together.

The Breakthrough

📖Deep Q-Network (DQN)

A Q-learning agent that uses a deep neural network to approximate the Q-function, stabilized by two key innovations: experience replay and target networks.

The deadly triad (off-policy + function approximation + bootstrapping) seemed fatal. DQN survives by breaking two correlations:

  1. Experience replay breaks the correlation between consecutive samples
  2. Target networks break the correlation between Q-values and their targets

Chapter Overview

This chapter covers the complete DQN algorithm, piece by piece:

The Core Idea

DQN is fundamentally just Q-learning with a neural network:

Q(s,a)Q(s,a)+α[r+γmaxaQ(s,a)Q(s,a)]Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]

But instead of a table, we have:

  • A neural network Q(s,a;θ)Q(s, a; \theta) parameterized by weights θ\theta
  • A replay buffer storing past experiences
  • A target network Q(s,a;θ)Q(s, a; \theta^-) with frozen weights
Mathematical Details

The DQN loss function:

L(θ)=E(s,a,r,s)D[(r+γmaxaQ(s,a;θ)Q(s,a;θ))2]L(\theta) = \mathbb{E}_{(s,a,r,s') \sim \mathcal{D}} \left[ \left( r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right)^2 \right]

where D\mathcal{D} is the replay buffer and θ\theta^- are the target network parameters.

Prerequisites

This chapter builds on:

Key Questions We’ll Answer

  • Why does naive neural network Q-learning fail?
  • How does storing and replaying experiences help?
  • Why do we need a separate target network?
  • What preprocessing is needed for visual inputs?

Key Takeaways

  • DQN = Q-learning + neural network + two key tricks
  • Experience replay decorrelates training samples and improves data efficiency
  • Target networks provide stable targets during learning
  • Frame stacking provides temporal information from static images
  • DQN achieved superhuman performance on many Atari games with a single algorithm
Next ChapterDQN Improvements