Foundations • Part 1 of 3
Editor Reviewed

The Core Idea

Learning from interaction: the essence of RL

Learning Through Interaction

📌RL in Everyday Life

Every time you teach a dog a trick, you’re doing reinforcement learning. Give a treat when it sits; it learns to sit. Every time you figure out which route to work avoids traffic, you’re doing reinforcement learning. Try a new road; notice it’s faster; take it tomorrow.

📖Reinforcement Learning

Reinforcement learning (RL) is the science of learning through trial and error—taking actions, observing consequences, and adjusting behavior to achieve goals. It’s perhaps the most natural form of learning, and it’s also become one of the most powerful approaches in artificial intelligence.

At its core, RL is about an agent interacting with an environment:

Agent
Learns & Decides
action
state, reward
Environment
Responds & Rewards

↻ This loop repeats at each timestep

The agent follows this cycle:

  1. Observe the current state of the world
  2. Choose an action based on what it has learned
  3. Receive a reward and observe the new state
  4. Update its behavior to get more reward in the future
  5. Repeat

This loop—observe, act, learn—is the beating heart of reinforcement learning.

RL vs. Other Types of Learning

You might be familiar with supervised learning and unsupervised learning. Where does RL fit?

🏷️
Supervised

Learns from labeled examples

“This photo is a cat. This one is a dog. Learn the pattern.”

🔍
Unsupervised

Learns from unlabeled data

“Here’s customer data. Find natural groupings yourself.”

🎮
Reinforcement

Learns from rewards & actions

“You took that action. Here’s your reward. Figure out what works.”

The key difference: Supervised learning gets the right answer handed to it. Reinforcement learning doesn’t get answers—it gets feedback.

ℹ️RL Feedback is Tricky

Unlike supervised learning where you immediately know if you’re right, RL feedback can be:

  • Delayed — You don’t know if a chess move was good until the game ends
  • Sparse — Most actions get zero reward
  • Noisy — Sometimes good actions lead to bad outcomes by chance

Why Can’t We Just Use Supervised Learning?

A natural question: if we want an agent to play a game, why not just collect expert games and train a classifier to predict the expert’s moves?

This approach (called imitation learning) can work, but it has limitations:

1.
You need expert data
What if no expert exists for your problem?
2.
You can only match the expert
You can never exceed their performance.
3.
Distribution shift
When the agent makes a mistake, it enters states the expert never visited. It doesn’t know what to do there.
💡The RL Advantage

RL solves these problems by learning directly from interaction. The agent doesn’t need a teacher—just a goal.