Understanding Reinforcement Learning and Multi-Agent Systems: A Beginner’s Guide to MARL (Part 1)

Author(s): Arthur Kakande

Originally published on Towards AI.

Understanding Reinforcement Learning and Multi-Agent Systems: A Beginner’s Guide to MARL (Part 1) — *Photo by Hyundai Motor Group on Unsplash*

When we learn from labeled data, we call it supervised learning. When we learn by grouping similar items, we call it clustering. When we learn by observing rewards or gains, we call it reinforcement learning.

To put it simply, reinforcement learning is the process of figuring out the best actions or strategies based on observed rewards. This type of learning is especially useful for tasks with a large number of possible actions. For example, imagine playing a game of Snakes and Ladders — where you can move left, right, up, or down. A specific combination of moves, like up → left → up → right, might result in winning the game. Reinforcement learning helps an agent (the decision-maker) explore different move combinations and learn which ones consistently lead to victory. In some cases, multiple agents can learn and interact together. A good example is autonomous cars sharing the same road. This is known as Multi-Agent Reinforcement Learning (MARL).

What is Autonomous Control (AC)?
Now that I have introduced autonomous vehicles above, I will dive into what autonomous control is. AC refers to those systems where decisions are decentralized. Decentralized in this case means individual components such as robots or vehicles can make independent choices within their environment. MARL is particularly useful here. Let’s take for example, in logistics we could attach an intelligent software agent to a container, a vehicle, and a storage facility, this creates our multi-agent system whereby the container could independently explore the best storage facility as its destination, it can additionally select a suitable transport provider to move it to this identified facility which altogether maximizes the efficiency. In this simple illustration, it’s just one container, now imagine how efficient it would be if multiple containers could be grouped and transported altogether in the same manner. Similarly, a fleet of delivery robots tasked with dropping off packages would need to coordinate to ensure efficiency and avoid delays. This is where MARL becomes very crucial as it enables this kind of strategic decision-making.

Now looking back at autonomous cars, in another scenario, one might have multiple self-driving cars that have to share a road or even co-ordinate their activity at a junction or roundabout. To do this manually, one might need to create a schedule that ensures a specific number of cars are crossing a specific junction at a specific time to avoid collision. This would be very difficult and not scalable. To tackle this challenge these autonomous cars must learn to coordinate movements to avoid accidents and improve traffic flow altogether. Predicting and responding to each other’s actions creates a smoother driving experience. This same illustration would apply to a fleet of delivery robots.

Single-Agent vs. Multi-Agent Reinforcement Learning
Now that we understand what autonomous control is, we can dive deeper into RL and understand how combining the two leads to efficient systems. But first, we should understand how reinforcement learning for a single agent works. There are a few key concepts you must understand as you dive into RL. These include; “agents” who are the decision-makers in the “environment”, the environment being the space in which the agent is operating, operating by taking “actions”, actions being the choice options an agent can make which sometimes have an effect on the environment in the form of a state, “States” being the current condition of the environment. While the agent navigates all this, it receives some feedback based on the actions made in particular states and this is known as “rewards”.

A popular algorithm used for training a single agent is the Q-learning algorithm. The algorithm works by helping the agent estimate a reward from performing different actions in different states. An action in this case could be moving a step forward, and the state could be the new current environment after the action has been taken. The agent observes this current state and might receive a reward. After exploring multiple actions and states and observing rewards, the agent updates its knowledge whenever it observes new rewards and makes estimations of which combinations of states and actions yielded a reward. These are called Q-values and sometimes they converge yielding optimal decisions. For example, the moves up → left → up → right that I previously introduced would be the optimal decisions i.e. the states and actions that yielded the highest Q-values.

Here’s how Q-learning works step by step:

Where the state s, and the current state-action pair value estimate from a and s donated by Qt (s, a), t + 1 denotes the time constant, γ is the discount factor, r t + 1 is the payoff that the agent receives when action a is taken in state s, and parameter α is a learning rate.

Challenges in Multi-Agent RL
When it comes to multiple agents sharing an environment, things get more complex. This is because the agents influence each other’s decisions. The environment in this case is no longer static. Let’s say delivery agent 1 picked up an item for delivery in state K and was able to get a reward, what would stop delivery agent 2 from picking up that item in a different state during a different episode? Making the environment change every time.

Additionally, there are multiple settings in which the approaches would differ for example in a competitive setting, an agent may try to outsmart opponents by predicting their moves as opposed to a cooperative setting, where agents work together to maximize a shared reward. This complexity means multi-agent systems require more advanced strategies compared to single-agent RL. This brings us to our next question; how do multiple agents learn together?

There are different approaches to multi-agent learning: we can let one agent make decisions for everyone and this agent takes the role of a coordinator delegating tasks to all the other agents, this is known as centralized learning. Alternatively, we could either let each agent learn and act independently and learn from observing each other’s actions and this is known as decentralized learning, or use centralized training with decentralized execution an approach where agents get global information during training but act independently when deployed.

During this learning, the agents can be able to coordinate either explicitly by directly exchanging messages or implicitly by inferring other agent’s actions without direct message exchange.

What’s Next?

Now that I have introduced you to the basics of RL and multi-agent systems, we should dive deeper into what MARL algorithms are and look at how they differ. In Part 2 of this blog series, we shall explore elements of independent Q-learning for MARL alongside team-based approaches. Stay tuned!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Understanding Reinforcement Learning and Multi-Agent Systems: A Beginner’s Guide to MARL (Part 1)

Author(s): Arthur Kakande

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Understanding Reinforcement Learning and Multi-Agent Systems: A Beginner’s Guide to MARL (Part 1)

Author(s): Arthur Kakande

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement