Inside xLAM: Salesforce’s Models Specialized for Agentic Tasks

Last Updated on September 18, 2024 by Editorial Team

Author(s): Jesus Rodriguez

Originally published on Towards AI.

I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

TheSequence | Jesus Rodriguez | Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

thesequence.substack.com

Agentic workflows is one of the most interesting categories in foundation model research. By agentic workflows we are referring to AI programs that can execute actions in a specific environment. One of the main debates in the agent community is how many capabilities go into a model versus peripherical methods like RAG. Recently, Salesforce Research published some major work with agentic AI with xLAM, a series of models optimized for agentic tasks.

xLAM is a new series of action models designed specifically for AI tasks. It includes five different models, built using either dense or mixture-of-expert architectures. These models range in size from 1 billion to 8×22 billion parameters. A flexible and scalable training pipeline was used to enhance their performance across a variety of environments by combining and augmenting diverse datasets. Initial tests show that xLAM consistently performs well, placing first on the Berkeley Function-Calling Leaderboard and surpassing other prominent models like GPT-4 and Claude-3 in specific tasks, particularly those requiring tool use.

Agentic Models vs. Agentic RAG

As agentic AI evolves, there is a lively debate about which capabilities make it as part of the models instead of external capabilities. Obviously, techniques such as retrieval augmented generation(RAG) are the most common candidate for agentic tasks. However, recently there have been a growth on the number of models specialized in agentic tasks specifically in API calling.

The idea that many of the function calling is sophisticated because the stochastic nature of LLMs. Function calling is by definition a discrete task so incorporating that into LLM introduces a number of interesting challenges to say the least. These are some of the challenges that xLAM tried to tackle.

xLAM

The xLAM series offers a range of models suited for various needs, from smaller models like the 1B and 7B versions that are optimized for on-device applications, to larger models like the 8x7B and 8x22B versions intended for more complex tasks. Insights from the training of these models emphasize key lessons in data handling, such as the importance of unifying and augmenting data to increase its diversity and reduce overfitting. The use of synthetic data has been especially valuable, enabling xLAM models to secure top positions on competitive leaderboards.

Three main xLAM models are available, designed for different use cases:

· xLAM-7b-r: A 7B-parameter model for quick experimentation in academic settings, particularly when resources are limited.

· xLAM-8x7b-r: An 8x7B mixture-of-experts model designed for industrial applications where balancing latency, resources, and performance is key.

· xLAM-8x22b-r: The largest model, suitable for projects with substantial computational resources and high-performance demands.

These models can handle both single-turn and multi-turn tasks across various benchmarks and environments. Earlier versions, such as xLAM-1b-fc-r and xLAM-7b-fc-r, were developed for single-turn function-calling tasks, with xLAM-7b-fc-r previously achieving second place on the Berkeley Function-Calling Leaderboard, although it now ranks sixteenth in the latest version. Meanwhile, the smaller xLAM-1b-fc-r, known as the “Tiny Giant,” is optimized for mobile use.

Data Processing and Augmentation

The data pipeline for xLAM models involves several critical steps. First, data is unified into a standard format that works across different environments and tasks. This makes it easier to apply augmentations and identify errors, such as incorrect function calls or hallucinations. The augmentation process itself focuses on improving data diversity by applying various transformations, producing new synthetic samples. The unified format simplifies this process, ensuring consistency.

Error detection is another crucial part of the data pipeline, with rule-based and large language model (LLM) tools used to spot issues like undefined functions and poor reasoning.

Data Synthesis and Mixture

xLAM uses a systematic data synthesis framework known as APIGen. This framework creates verified datasets based on executable APIs, ensuring high-quality data through a multi-step verification process. Data from several sources, including synthetic datasets and general instruction datasets, is combined for supervised fine-tuning of xLAM models.

Training

Model training follows a supervised fine-tuning (SFT) approach, making use of a flexible data pipeline. The training framework is built on HuggingFace libraries and PyTorch, and xLAM models undergo multiple training epochs with shuffled datasets to ensure robust learning. For the largest model, xLAM-8x22b-r, the LoRA method is used to preserve its abilities while preventing it from forgetting previously learned information. LoRA is also employed in aligning all xLAM models with the DPO method, and a cosine learning rate scheduler is used to further optimize training.

Benchmark Results

To assess xLAM’s performance, a variety of benchmarks are used, including:

· Webshop: An environment simulating online shopping tasks to test how well agents assist in e-commerce. Performance is measured through success and progress rates.

· ToolQuery: A benchmark testing how well agents use tools to retrieve information across multiple domains. This test includes a variety of settings like weather and movies, with success and progress rates used as metrics.

· ToolBench: A real-time evaluation platform for multi-turn reasoning via RapidAPI, with a pass rate metric used to gauge success. This benchmark includes in-domain and out-of-domain tasks.

· Berkeley Function-Calling Leaderboard: This benchmark tests an agent’s ability to handle function calls in various programming languages and application domains. With over 2,200 test cases, it challenges models with complex tasks involving multiple function calls. The evaluation includes accuracy metrics like Abstract Syntax Tree (AST) accuracy and executable accuracy.

These benchmarks confirm that xLAM models are highly capable in environments requiring complex reasoning and tool use.

xLAM is one of the most that highlights the potential of agentic workflows as part of LLMs. Its going to be interesting to see how Salesforce incorporates xLAM into its own products.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Inside xLAM: Salesforce’s Models Specialized for Agentic Tasks

Author(s): Jesus Rodriguez

TheSequence | Jesus Rodriguez | Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Agentic Models vs. Agentic RAG

xLAM

Data Processing and Augmentation

Data Synthesis and Mixture

Training

Benchmark Results

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Inside xLAM: Salesforce’s Models Specialized for Agentic Tasks

Author(s): Jesus Rodriguez

TheSequence | Jesus Rodriguez | Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Agentic Models vs. Agentic RAG

xLAM

Data Processing and Augmentation

Data Synthesis and Mixture

Training

Benchmark Results

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥