The RL Framework

What You'll Learn

Understand the agent-environment interface and interaction loop
Define states, actions, rewards, and transitions
Learn about policies: how agents decide what to do
Understand value functions: how agents evaluate states and actions
See how these concepts fit together in a complete RL system

Chapter Overview

Now that you know what reinforcement learning is and where it’s used, it’s time to understand the precise framework that makes it work. Every RL problem—from playing Atari games to training language models—can be described using the same fundamental components.

In this chapter, we’ll formalize the building blocks of RL: what agents observe, how they act, what rewards they receive, and how they represent what they’ve learned.

In this chapter:

The Agent-Environment Interface

States, actions, and the interaction loop that defines RL problems

Rewards and Returns

Defining goals through reward signals and cumulative returns

Policies and Value Functions

How agents represent and store what they’ve learned

Exploration vs Exploitation

The fundamental tradeoff at the heart of reinforcement learning

The Complete Picture

The RL Framework

States

What the agent observes

Actions

What the agent can do

Rewards

Feedback signals

↓

Policy

Decides actions from states

Value Function

Predicts future rewards

These components form a complete framework for describing any RL problem. The agent uses its policy to choose actions based on states, receives rewards from the environment, and uses value functions to guide learning.

💡Building on Chapter 1

This chapter formalizes the concepts introduced in What is RL?. If any terms feel unfamiliar, review the previous chapter first.

Next ChapterGetting Started→