Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Inside xLAM: Salesforce’s Models Specialized for Agentic Tasks
Artificial Intelligence   Latest   Machine Learning

Inside xLAM: Salesforce’s Models Specialized for Agentic Tasks

Last Updated on September 18, 2024 by Editorial Team

Author(s): Jesus Rodriguez

Originally published on Towards AI.

Created Using DALL-E

I recently started an AI-focused educational newsletter, that already has over 170,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

TheSequence | Jesus Rodriguez | Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

thesequence.substack.com

Agentic workflows is one of the most interesting categories in foundation model research. By agentic workflows we are referring to AI programs that can execute actions in a specific environment. One of the main debates in the agent community is how many capabilities go into a model versus peripherical methods like RAG. Recently, Salesforce Research published some major work with agentic AI with xLAM, a series of models optimized for agentic tasks.

xLAM is a new series of action models designed specifically for AI tasks. It includes five different models, built using either dense or mixture-of-expert architectures. These models range in size from 1 billion to 8×22 billion parameters. A flexible and scalable training pipeline was used to enhance their performance across a variety of environments by combining and augmenting diverse datasets. Initial tests show that xLAM consistently performs well, placing first on the Berkeley Function-Calling Leaderboard and surpassing other prominent models like GPT-4 and Claude-3 in specific tasks, particularly those requiring tool use.

Agentic Models vs. Agentic RAG

As agentic AI evolves, there is a lively debate about which capabilities make it as part of the models instead of external capabilities. Obviously, techniques such as retrieval augmented generation(RAG) are the most common candidate for agentic tasks. However, recently there have been a growth on the number of models specialized in agentic tasks specifically in API calling.

The idea that many of the function calling is sophisticated because the stochastic nature of LLMs. Function calling is by definition a discrete task so incorporating that into LLM introduces a number of interesting challenges to say the least. These are some of the challenges that xLAM tried to tackle.

Image Credit: Salesforce Research

xLAM

The xLAM series offers a range of models suited for various needs, from smaller models like the 1B and 7B versions that are optimized for on-device applications, to larger models like the 8x7B and 8x22B versions intended for more complex tasks. Insights from the training of these models emphasize key lessons in data handling, such as the importance of unifying and augmenting data to increase its diversity and reduce overfitting. The use of synthetic data has been especially valuable, enabling xLAM models to secure top positions on competitive leaderboards.

Three main xLAM models are available, designed for different use cases:

· xLAM-7b-r: A 7B-parameter model for quick experimentation in academic settings, particularly when resources are limited.

· xLAM-8x7b-r: An 8x7B mixture-of-experts model designed for industrial applications where balancing latency, resources, and performance is key.

· xLAM-8x22b-r: The largest model, suitable for projects with substantial computational resources and high-performance demands.

These models can handle both single-turn and multi-turn tasks across various benchmarks and environments. Earlier versions, such as xLAM-1b-fc-r and xLAM-7b-fc-r, were developed for single-turn function-calling tasks, with xLAM-7b-fc-r previously achieving second place on the Berkeley Function-Calling Leaderboard, although it now ranks sixteenth in the latest version. Meanwhile, the smaller xLAM-1b-fc-r, known as the “Tiny Giant,” is optimized for mobile use.

Image Credit: Salesforce Research

Data Processing and Augmentation

The data pipeline for xLAM models involves several critical steps. First, data is unified into a standard format that works across different environments and tasks. This makes it easier to apply augmentations and identify errors, such as incorrect function calls or hallucinations. The augmentation process itself focuses on improving data diversity by applying various transformations, producing new synthetic samples. The unified format simplifies this process, ensuring consistency.

Error detection is another crucial part of the data pipeline, with rule-based and large language model (LLM) tools used to spot issues like undefined functions and poor reasoning.

Data Synthesis and Mixture

xLAM uses a systematic data synthesis framework known as APIGen. This framework creates verified datasets based on executable APIs, ensuring high-quality data through a multi-step verification process. Data from several sources, including synthetic datasets and general instruction datasets, is combined for supervised fine-tuning of xLAM models.

Training

Model training follows a supervised fine-tuning (SFT) approach, making use of a flexible data pipeline. The training framework is built on HuggingFace libraries and PyTorch, and xLAM models undergo multiple training epochs with shuffled datasets to ensure robust learning. For the largest model, xLAM-8x22b-r, the LoRA method is used to preserve its abilities while preventing it from forgetting previously learned information. LoRA is also employed in aligning all xLAM models with the DPO method, and a cosine learning rate scheduler is used to further optimize training.

Benchmark Results

To assess xLAM’s performance, a variety of benchmarks are used, including:

· Webshop: An environment simulating online shopping tasks to test how well agents assist in e-commerce. Performance is measured through success and progress rates.

· ToolQuery: A benchmark testing how well agents use tools to retrieve information across multiple domains. This test includes a variety of settings like weather and movies, with success and progress rates used as metrics.

· ToolBench: A real-time evaluation platform for multi-turn reasoning via RapidAPI, with a pass rate metric used to gauge success. This benchmark includes in-domain and out-of-domain tasks.

· Berkeley Function-Calling Leaderboard: This benchmark tests an agent’s ability to handle function calls in various programming languages and application domains. With over 2,200 test cases, it challenges models with complex tasks involving multiple function calls. The evaluation includes accuracy metrics like Abstract Syntax Tree (AST) accuracy and executable accuracy.

Image Credit: Salesforce Research

These benchmarks confirm that xLAM models are highly capable in environments requiring complex reasoning and tool use.

Image Credit: Salesforce Research

xLAM is one of the most that highlights the potential of agentic workflows as part of LLMs. Its going to be interesting to see how Salesforce incorporates xLAM into its own products.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓