The RL Framework
What You'll Learn
- Understand the agent-environment interface and interaction loop
- Define states, actions, rewards, and transitions
- Learn about policies: how agents decide what to do
- Understand value functions: how agents evaluate states and actions
- See how these concepts fit together in a complete RL system
Chapter Overview
Now that you know what reinforcement learning is and where it’s used, it’s time to understand the precise framework that makes it work. Every RL problem—from playing Atari games to training language models—can be described using the same fundamental components.
In this chapter, we’ll formalize the building blocks of RL: what agents observe, how they act, what rewards they receive, and how they represent what they’ve learned.
The Complete Picture
These components form a complete framework for describing any RL problem. The agent uses its policy to choose actions based on states, receives rewards from the environment, and uses value functions to guide learning.
This chapter formalizes the concepts introduced in What is RL?. If any terms feel unfamiliar, review the previous chapter first.