Foundations • Part 1 of 4
Editor Reviewed

The Agent-Environment Interface

States, actions, and the interaction loop

The Building Blocks of RL

Now that we’ve seen RL in action, let’s define our terms precisely. Every RL system has the same core components.

🤖
Agent
The learner and decision-maker

This is what we’re building. The agent observes, decides, and learns from experience.

🌍
Environment
Everything outside the agent

The world the agent interacts with. Responds to actions with new states and rewards.

State
ss — A snapshot of the current situation
In chess: positions of all pieces. In driving: your location, speed, and surroundings.
Action
aa — A choice the agent can make
Move a chess piece, press the accelerator, turn left.
Reward
rr — Feedback on how good the outcome was
Win the game: +1. Lose: -1. Each step in a maze: -0.01 (encouraging speed).
Policy
π\pi — The agent’s strategy
A mapping from states to actions. “In this state, do that.”
Value
How good a state is in the long run
High value = lots of future reward expected from this point onward.

A Simple Example: GridWorld

Throughout this book, we’ll use GridWorld as our primary example environment. It’s simple enough to understand completely, yet rich enough to illustrate key concepts.

📌GridWorld Setup

Imagine a 4×4 grid. Your agent starts in one corner. The goal is in the opposite corner. Each step, the agent can move up, down, left, or right (if not blocked by a wall).

A
·
·
·
·
·
·
·
·
·
·
·
·
·
·
G
Agent
Goal
In GridWorld terms:
  • State: Agent’s position (row, column)
  • Actions: Up, Down, Left, Right
  • Reward: -1 per step, +10 at goal
  • Episode: Ends when agent reaches goal
Why -1 per step?

This encourages the agent to find the shortest path. Without step penalties, the agent wouldn’t care how long it takes to reach the goal.

💡GridWorld Is Your Friend

This simple setup illustrates the core RL loop: observe position → choose direction → receive reward → repeat. We’ll return to GridWorld throughout the book to test new algorithms.