Learning Through Interaction
Every time you teach a dog a trick, you’re doing reinforcement learning. Give a treat when it sits; it learns to sit. Every time you figure out which route to work avoids traffic, you’re doing reinforcement learning. Try a new road; notice it’s faster; take it tomorrow.
Reinforcement learning (RL) is the science of learning through trial and error—taking actions, observing consequences, and adjusting behavior to achieve goals. It’s perhaps the most natural form of learning, and it’s also become one of the most powerful approaches in artificial intelligence.
At its core, RL is about an agent interacting with an environment:
↻ This loop repeats at each timestep
The agent follows this cycle:
- Observe the current state of the world
- Choose an action based on what it has learned
- Receive a reward and observe the new state
- Update its behavior to get more reward in the future
- Repeat
This loop—observe, act, learn—is the beating heart of reinforcement learning.
RL vs. Other Types of Learning
You might be familiar with supervised learning and unsupervised learning. Where does RL fit?
Learns from labeled examples
“This photo is a cat. This one is a dog. Learn the pattern.”
Learns from unlabeled data
“Here’s customer data. Find natural groupings yourself.”
Learns from rewards & actions
“You took that action. Here’s your reward. Figure out what works.”
The key difference: Supervised learning gets the right answer handed to it. Reinforcement learning doesn’t get answers—it gets feedback.
Unlike supervised learning where you immediately know if you’re right, RL feedback can be:
- Delayed — You don’t know if a chess move was good until the game ends
- Sparse — Most actions get zero reward
- Noisy — Sometimes good actions lead to bad outcomes by chance
Why Can’t We Just Use Supervised Learning?
A natural question: if we want an agent to play a game, why not just collect expert games and train a classifier to predict the expert’s moves?
This approach (called imitation learning) can work, but it has limitations:
RL solves these problems by learning directly from interaction. The agent doesn’t need a teacher—just a goal.