Multi-Agent Reinforcement Learning

What You'll Learn

Explain challenges unique to multi-agent settings
Distinguish cooperative, competitive, and mixed settings
Describe independent learning and its limitations
Explain centralized training with decentralized execution (CTDE)
Implement simple multi-agent algorithms
Understand game-theoretic concepts like Nash equilibrium and self-play

So far, our agent has been alone in its environment. But most interesting problems involve multiple decision-makers: autonomous vehicles sharing roads, robots cooperating in a warehouse, or AIs competing in games. When multiple agents learn simultaneously, everything changes.

Imagine playing chess. You’re not just optimizing against a static puzzle—you’re facing an opponent who adapts to your strategies. Every time you find a clever tactic, they might find a counter. The “optimal” move depends on what your opponent will do, which depends on what they think you’ll do, which depends on…

This recursive reasoning is the essence of multi-agent RL. The environment includes other thinking, learning entities. And that changes everything.

Chapter Overview

In this chapter:

Multi-Agent Settings

Cooperative, competitive, and mixed-motive games, plus the Markov game formulation

Independent Learning

The simple approach where each agent learns alone, and why it often struggles

Centralized Training, Decentralized Execution

The dominant paradigm: share information during training, act independently at test time

The Big Picture

📖Multi-Agent RL

Reinforcement learning with multiple agents that interact in a shared environment. Each agent’s optimal behavior depends on the behaviors of other agents, creating complex strategic dynamics that go beyond single-agent optimization.

In multi-agent RL, each agent faces a moving target: other agents are learning too, changing the environment dynamics. Simple independent learning often fails because the environment appears non-stationary. The solution: train agents together (centralized) but deploy them independently (decentralized).

Consider three types of multi-agent scenarios:

Cooperative: A team of robots assembling a car. They share a goal and succeed or fail together. Communication and coordination are key.

Competitive: Two players in a zero-sum game like chess. One’s gain is the other’s loss. Strategy and adaptation are key.

Mixed: Traffic at an intersection. Everyone wants to get through quickly, but crashes hurt everyone. Some coordination emerges, but incentives aren’t fully aligned.

Each type brings different challenges and requires different approaches.

Why Multi-Agent RL Matters

Real-World Multi-Agent Systems

Autonomous vehicle fleets
Warehouse robotics teams
Smart grid coordination
Trading agents in markets

Notable Achievements

OpenAI Five: Dota 2 world champions
AlphaStar: Grandmaster-level StarCraft II
Multi-agent hide and seek emergence
Cooperative manipulation tasks

💡Start Here

New to multi-agent RL? Begin with Multi-Agent Settings to understand the different types of multi-agent problems, then continue through the sections in order.

Key Takeaways

Multi-agent RL involves multiple learning agents interacting in a shared environment
Settings can be cooperative, competitive, or mixed-motive
Independent learning faces non-stationarity as other agents change their behavior
Centralized training with decentralized execution (CTDE) is the dominant paradigm
Self-play creates a curriculum of increasingly strong opponents
Emergent behaviors in multi-agent systems can be surprising and powerful

Next ChapterOffline RL→