The Agent-Environment Interface | The RL Framework

The Building Blocks of RL

Now that we’ve seen RL in action, let’s define our terms precisely. Every RL system has the same core components.

🤖

Agent

The learner and decision-maker

This is what we’re building. The agent observes, decides, and learns from experience.

🌍

Environment

Everything outside the agent

The world the agent interacts with. Responds to actions with new states and rewards.

State

s

— A snapshot of the current situation

In chess: positions of all pieces. In driving: your location, speed, and surroundings.

Action

a

— A choice the agent can make

Move a chess piece, press the accelerator, turn left.

Reward

r

— Feedback on how good the outcome was

Win the game: +1. Lose: -1. Each step in a maze: -0.01 (encouraging speed).

Policy

\pi

— The agent’s strategy

A mapping from states to actions. “In this state, do that.”

Value

How good a state is in the long run

High value = lots of future reward expected from this point onward.

A Simple Example: GridWorld

Throughout this book, we’ll use GridWorld as our primary example environment. It’s simple enough to understand completely, yet rich enough to illustrate key concepts.

📌GridWorld Setup

Imagine a 4×4 grid. Your agent starts in one corner. The goal is in the opposite corner. Each step, the agent can move up, down, left, or right (if not blocked by a wall).

Agent

Goal

In GridWorld terms:

State: Agent’s position (row, column)
Actions: Up, Down, Left, Right
Reward: -1 per step, +10 at goal
Episode: Ends when agent reaches goal

Why -1 per step?

This encourages the agent to find the shortest path. Without step penalties, the agent wouldn’t care how long it takes to reach the goal.

💡GridWorld Is Your Friend

This simple setup illustrates the core RL loop: observe position → choose direction → receive reward → repeat. We’ll return to GridWorld throughout the book to test new algorithms.