The Clever Way to Calculate Values, Bellman’s “Secret”

Author(s): Rem E

Originally published on Towards AI.

Tutorial 6: This time, we’ll update our values as the agent moves through the maze, using Bellman’s so-called “secret”

I know the Bellman equations aren’t really a secret, but few people truly know how to use them. Do you want to be one of them?
This tutorial builds directly on Tutorial 5, so check that out first if you haven’t already!
And if you’re new to value functions or the Bellman equation, be sure to Why Is the Bellman Equation So Powerful in RL? before diving in.

🔍 What You’ll Learn

We’re moving on from the old, boring Monte Carlo style and listening to Bellman’s advice on how to update our values as the agent moves.
In this tutorial, we’ll modify the agent code so it updates values step-by-step during the agent’s journey through the maze, just like a real RL agent interacting with its environment.
Since we haven’t reached the learning algorithms yet (again), we’ll still use the manual trajectories we defined earlier, but improve how the value and update functions work to reflect this online updating process.

🛠️Project Setup

The code for this tutorial is available in this GitHub repository.
If you haven’t already, follow the instructions in the README.md file to get started.
If you’ve already cloned the repo, make sure to pull the latest changes to access the new tutorial (tutorial-6).
Once everything is set up, you’ll notice it follows the same folder structure as tutorial-5:

The Clever Way to Calculate Values, Bellman’s “Secret” — Tutorial-6 Folder Structure, Source: Image by the author

🌊Before You Dive In…

In the theoretical part of this tutorial (check the intro!), we explained the Bellman equation for the general case with stochastic environments.
But you know our maze problem is deterministic: in transitions, actions, and rewards.
So in this tutorial, we’re going to use a simplified Bellman formula that fits deterministic cases perfectly:

Simplified Bellman Equation, Source: Image by the author

-What, you deleted everything we learned?

Yeah, unfortunately, our example is too simple for the fancy full Bellman equation.
Since actions and transitions are deterministic, we don’t need those summations over actions and next states anymore. We just add the immediate reward to the value of the next state multiplied by gamma.
Yes, we’re going to use gamma here!

But don’t be sad, we’re still using the heart of Bellman’s equation: recursion!
And I promise you, once our poor robot learns to navigate on its own, we’ll explore a stochastic example to fully capture the beauty of Bellman equations and related concepts in stochastic environments.

That’s a promise!

🤖Agent Class Implementation

class Agent:
 def __init__(self, env: gym.Env):
 self.env = env
 self.gamma = self.env.gamma
 self.i = 0
 self.j = 0
 self.values = np.full((env.size, env.size), np.nan, dtype=float)

That’s all we need for now! We get the discount factor gamma from the environment, so we can use it here.

def _V(self, s):
 v = self.values[s[0], s[1]]
 if np.isnan(v):
 return 0.0
 return v

The _V() function is a simple way to access the value for a state. If it’s the first time visiting that state (value == NaN), we return 0 so the calculations can work smoothly.

def update(self, s, a, r, nxt_s, over):
 self.values[s[0], s[1]] = r + self.gamma * self._V(nxt_s)

Here’s where we actually use the Bellman equation!
Notice that the update() function is called after each step. This line simply assigns the value of the current state to:
The immediate reward r, plus the discounted value of the next state.
This matches exactly the simplified Bellman formula we introduced earlier (check Before You Dive In).

And that’s it! We’re done. The rest of the code stays the same as in the previous tutorial.
Now, head over to the tutorial directory and run the code, watch how the values update as the agent moves!

Notice how the reward propagates more slowly here compared to the Monte Carlo method. That’s because this style updates values step-by-step as the agent moves, rather than waiting until the end of an episode.

It’s more realistic since it mimics the way real agents (and even animals!) learn from experience, updating their understanding continuously as they go, not just after the whole journey finishes.
This on-the-fly updating is a key building block for many powerful RL algorithms to come!

✅ What We’ve Learned…

A tiny but powerful tutorial, all thanks to the Bellman equation, making our lives easier!
We learned how to improve our agent’s code so it updates state values step-by-step using the Bellman equation. And yeah, that’s it for today, see you next time!

👉 In the next tutorial, we’ll finally apply our very first method to solve an RL problem: Dynamic Programming!

Watch Our Agent Learn

Tutorial 7: Implementing Dynamic Programming for our maze problem

pub.towardsai.net

✨ As always… stay curious, stay coding, and stay tuned!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

The Clever Way to Calculate Values, Bellman’s “Secret”

Author(s): Rem E

Tutorial 6: This time, we’ll update our values as the agent moves through the maze, using Bellman’s so-called “secret”

🔍 What You’ll Learn

🛠️Project Setup

🌊Before You Dive In…

🤖Agent Class Implementation

✅ What We’ve Learned…

Watch Our Agent Learn

Tutorial 7: Implementing Dynamic Programming for our maze problem

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

The Clever Way to Calculate Values, Bellman’s “Secret”

Author(s): Rem E

Tutorial 6: This time, we’ll update our values as the agent moves through the maze, using Bellman’s so-called “secret”

🔍 What You’ll Learn

🛠️Project Setup

🌊Before You Dive In…

🤖Agent Class Implementation

✅ What We’ve Learned…

Watch Our Agent Learn

Tutorial 7: Implementing Dynamic Programming for our maze problem

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement