Deep Q-Networks (DQN)

What You'll Learn

Explain why naive Q-learning with neural networks fails
Describe how experience replay breaks correlation
Explain target networks and why they stabilize training
Implement DQN from scratch
Understand frame stacking and preprocessing for visual inputs

In 2013, a paper from DeepMind shook the AI world: a single algorithm, with the same hyperparameters, learned to play 49 different Atari games from raw pixels, some at superhuman level.

That algorithm was DQN, and it showed that deep learning and reinforcement learning could work together.

The Breakthrough

📖Deep Q-Network (DQN)

A Q-learning agent that uses a deep neural network to approximate the Q-function, stabilized by two key innovations: experience replay and target networks.

The deadly triad (off-policy + function approximation + bootstrapping) seemed fatal. DQN survives by breaking two correlations:

Experience replay breaks the correlation between consecutive samples
Target networks break the correlation between Q-values and their targets

Chapter Overview

This chapter covers the complete DQN algorithm, piece by piece:

The Core Idea

DQN is fundamentally just Q-learning with a neural network:

$Q(s, a) \leftarrow Q(s, a) + \alpha \left[ r + \gamma \max_{a'} Q(s', a') - Q(s, a) \right]$

But instead of a table, we have:

A neural network $Q(s, a; \theta)$ parameterized by weights $\theta$
A replay buffer storing past experiences
A target network $Q(s, a; \theta^-)$ with frozen weights

∑Mathematical Details

The DQN loss function:

$L(\theta) = \mathbb{E}_{(s,a,r,s') \sim \mathcal{D}} \left[ \left( r + \gamma \max_{a'} Q(s', a'; \theta^-) - Q(s, a; \theta) \right)^2 \right]$

where $\mathcal{D}$ is the replay buffer and $\theta^-$ are the target network parameters.

Prerequisites

This chapter builds on:

Q-Learning for the core algorithm
Function Approximation for why we need neural networks and the challenges they introduce

Key Questions We’ll Answer

Why does naive neural network Q-learning fail?
How does storing and replaying experiences help?
Why do we need a separate target network?
What preprocessing is needed for visual inputs?

Key Takeaways

DQN = Q-learning + neural network + two key tricks
Experience replay decorrelates training samples and improves data efficiency
Target networks provide stable targets during learning
Frame stacking provides temporal information from static images
DQN achieved superhuman performance on many Atari games with a single algorithm

Next ChapterDQN Improvements→

Deep Q-Networks

Deep Q-Networks (DQN)

What You'll Learn

The Breakthrough

Chapter Overview

The DQN Architecture

Experience Replay

Target Networks

Putting It Together

The Core Idea

Prerequisites

Key Questions We’ll Answer

Key Takeaways