Multi-Agent Reinforcement Learning
What You'll Learn
- Explain challenges unique to multi-agent settings
- Distinguish cooperative, competitive, and mixed settings
- Describe independent learning and its limitations
- Explain centralized training with decentralized execution (CTDE)
- Implement simple multi-agent algorithms
- Understand game-theoretic concepts like Nash equilibrium and self-play
So far, our agent has been alone in its environment. But most interesting problems involve multiple decision-makers: autonomous vehicles sharing roads, robots cooperating in a warehouse, or AIs competing in games. When multiple agents learn simultaneously, everything changes.
Imagine playing chess. You’re not just optimizing against a static puzzle—you’re facing an opponent who adapts to your strategies. Every time you find a clever tactic, they might find a counter. The “optimal” move depends on what your opponent will do, which depends on what they think you’ll do, which depends on…
This recursive reasoning is the essence of multi-agent RL. The environment includes other thinking, learning entities. And that changes everything.
Chapter Overview
The Big Picture
Reinforcement learning with multiple agents that interact in a shared environment. Each agent’s optimal behavior depends on the behaviors of other agents, creating complex strategic dynamics that go beyond single-agent optimization.
In multi-agent RL, each agent faces a moving target: other agents are learning too, changing the environment dynamics. Simple independent learning often fails because the environment appears non-stationary. The solution: train agents together (centralized) but deploy them independently (decentralized).
Consider three types of multi-agent scenarios:
Cooperative: A team of robots assembling a car. They share a goal and succeed or fail together. Communication and coordination are key.
Competitive: Two players in a zero-sum game like chess. One’s gain is the other’s loss. Strategy and adaptation are key.
Mixed: Traffic at an intersection. Everyone wants to get through quickly, but crashes hurt everyone. Some coordination emerges, but incentives aren’t fully aligned.
Each type brings different challenges and requires different approaches.
Why Multi-Agent RL Matters
- Autonomous vehicle fleets
- Warehouse robotics teams
- Smart grid coordination
- Trading agents in markets
- OpenAI Five: Dota 2 world champions
- AlphaStar: Grandmaster-level StarCraft II
- Multi-agent hide and seek emergence
- Cooperative manipulation tasks
New to multi-agent RL? Begin with Multi-Agent Settings to understand the different types of multi-agent problems, then continue through the sections in order.
Key Takeaways
- Multi-agent RL involves multiple learning agents interacting in a shared environment
- Settings can be cooperative, competitive, or mixed-motive
- Independent learning faces non-stationarity as other agents change their behavior
- Centralized training with decentralized execution (CTDE) is the dominant paradigm
- Self-play creates a curriculum of increasingly strong opponents
- Emergent behaviors in multi-agent systems can be surprising and powerful