Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

The Clever Way to Calculate Values, Bellman’s “Secret”
Latest   Machine Learning

The Clever Way to Calculate Values, Bellman’s “Secret”

Author(s): Rem E

Originally published on Towards AI.

Tutorial 6: This time, we’ll update our values as the agent moves through the maze, using Bellman’s so-called “secret”

I know the Bellman equations aren’t really a secret, but few people truly know how to use them. Do you want to be one of them?
This tutorial builds directly on Tutorial 5, so check that out first if you haven’t already!
And if you’re new to value functions or the Bellman equation, be sure to Why Is the Bellman Equation So Powerful in RL? before diving in.

🔍 What You’ll Learn

We’re moving on from the old, boring Monte Carlo style and listening to Bellman’s advice on how to update our values as the agent moves.
In this tutorial, we’ll modify the agent code so it updates values step-by-step during the agent’s journey through the maze, just like a real RL agent interacting with its environment.
Since we haven’t reached the learning algorithms yet (again), we’ll still use the manual trajectories we defined earlier, but improve how the value and update functions work to reflect this online updating process.

🛠️Project Setup

The code for this tutorial is available in this GitHub repository.
If you haven’t already, follow the instructions in the README.md file to get started.
If you’ve already cloned the repo, make sure to pull the latest changes to access the new tutorial (tutorial-6).
Once everything is set up, you’ll notice it follows the same folder structure as tutorial-5:

The Clever Way to Calculate Values, Bellman’s “Secret”
Tutorial-6 Folder Structure, Source: Image by the author

🌊Before You Dive In…

In the theoretical part of this tutorial (check the intro!), we explained the Bellman equation for the general case with stochastic environments.
But you know our maze problem is deterministic: in transitions, actions, and rewards.
So in this tutorial, we’re going to use a simplified Bellman formula that fits deterministic cases perfectly:

Simplified Bellman Equation, Source: Image by the author

-What, you deleted everything we learned?

Yeah, unfortunately, our example is too simple for the fancy full Bellman equation.
Since actions and transitions are deterministic, we don’t need those summations over actions and next states anymore. We just add the immediate reward to the value of the next state multiplied by gamma.
Yes, we’re going to use gamma here!

But don’t be sad, we’re still using the heart of Bellman’s equation: recursion!
And I promise you, once our poor robot learns to navigate on its own, we’ll explore a stochastic example to fully capture the beauty of Bellman equations and related concepts in stochastic environments.

That’s a promise!

🤖Agent Class Implementation

class Agent:
def __init__(self, env: gym.Env):
self.env = env
self.gamma = self.env.gamma
self.i = 0
self.j = 0
self.values = np.full((env.size, env.size), np.nan, dtype=float)

That’s all we need for now! We get the discount factor gamma from the environment, so we can use it here.

def _V(self, s):
v = self.values[s[0], s[1]]
if np.isnan(v):
return 0.0
return v

The _V() function is a simple way to access the value for a state. If it’s the first time visiting that state (value == NaN), we return 0 so the calculations can work smoothly.

def update(self, s, a, r, nxt_s, over):
self.values[s[0], s[1]] = r + self.gamma * self._V(nxt_s)

Here’s where we actually use the Bellman equation!
Notice that the update() function is called after each step. This line simply assigns the value of the current state to:
The immediate reward r, plus the discounted value of the next state.
This matches exactly the simplified Bellman formula we introduced earlier (check Before You Dive In).

And that’s it! We’re done. The rest of the code stays the same as in the previous tutorial.
Now, head over to the tutorial directory and run the code, watch how the values update as the agent moves!

Notice how the reward propagates more slowly here compared to the Monte Carlo method. That’s because this style updates values step-by-step as the agent moves, rather than waiting until the end of an episode.

It’s more realistic since it mimics the way real agents (and even animals!) learn from experience, updating their understanding continuously as they go, not just after the whole journey finishes.
This on-the-fly updating is a key building block for many powerful RL algorithms to come!

✅ What We’ve Learned…

A tiny but powerful tutorial, all thanks to the Bellman equation, making our lives easier!
We learned how to improve our agent’s code so it updates state values step-by-step using the Bellman equation. And yeah, that’s it for today, see you next time!

👉 In the next tutorial, we’ll finally apply our very first method to solve an RL problem: Dynamic Programming!

Watch Our Agent Learn

Tutorial 7: Implementing Dynamic Programming for our maze problem

pub.towardsai.net

As always… stay curious, stay coding, and stay tuned!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.