Inside DBRX: Databricks’ Impressive Open Source LLM

Last Updated on April 1, 2024 by Editorial Team

Author(s): Jesus Rodriguez

Originally published on Towards AI.

I recently started an AI-focused educational newsletter, that already has over 165,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

thesequence.substack.com

The open-source generative AI landscape is experiencing tremendous momentum. Innovation comes not only from startups like HuggingFace, Mistral, or AI21 but also from large AI labs such as Meta. Databricks has been one of the tech incumbents exploring different angles in open source generative AI, mainly after the acquisition of MosaicML. A few days ago, Databricks open sourced DBRX, a massive general-purpose LLM that show incredible performance across different benchmarks.

DBRX builds on the mixture-of-experts(MoE) approach used by Mixtral which seems to be more and more the standard to follow in transformer based architecutures. Databricks released both the baseline model DBRX Base as well as the intstruction fine-tuned one DBRX Instruct. From the initial reports, it seems that Databricks’ edge was the quality of the dataset and training process although there are few details in those.

Architecture

DRRX is a large language model that operates on a transformer-based, decoder-only framework, specifically designed to predict the next token in a sequence. It is built upon a sophisticated mixture-of-experts (MoE) structure that boasts a total of 132 billion parameters, though it only engages 36 billion parameters for any given input. This model was enriched through training on a dataset comprising 12 trillion tokens, which includes both text and code. DBRX distinguishes itself by employing a more nuanced approach with 16 smaller experts, out of which 4 are selected for a task, unlike its contemporaries, Mixtral and Grok-1, which utilize 8 experts and choose 2. This approach results in a vastly increased number of potential expert combinations — 65 times more, to be precise — enhancing the model’s quality. Base incorporates advanced techniques such as rotary position encodings, gated linear units, and grouped query attention for improved performance, and it utilizes the GPT-4 tokenizer.

The dataset for DBRX’s training was meticulously compiled and is believed to be twice as effective, token-for-token, compared to the data used in previous models developed by the organization. This new dataset benefited from comprehensive data processing and management tools, facilitating an optimized training regimen that notably enhanced model quality through strategic adjustments in the data mix.

Training

DBRX’s development spanned three months, relying on 3072 NVIDIA H100 GPUs connected via a 3.2Tbps Infiniband network. This period marked the culmination of extensive preparatory work, including dataset research and scaling experiments, all part of the ongoing evolution of language model development. Notably, training the MoE variant of Base proved significantly more compute-efficient compared to traditional models.

This efficiency breakthrough was a part of an overarching advancement in the model’s training pipeline, now nearly four times more compute-efficient compared to ten months prior. Such efficiency gains were achieved through a combination of architectural innovations, optimization techniques, and, critically, the use of higher-quality training data.

Throughout the development of DBRX, a suite of proprietary tools was utilized for data management, processing, and model training, ensuring a seamless and integrated workflow. These tools allowed for extensive exploration, cleaning of data, and efficient model training across a vast array of GPUs, culminating in a streamlined process for model refinement and deployment.

Inference

The architecture of DBRX enables a delicate balance between model quality and inference efficiency, outperforming dense models in this regard. For instance, despite its size, DBRX achieves double the inference throughput of comparable models due to its efficient use of active parameters. The model offers enhanced performance metrics across a variety of benchmarks, setting new standards for both quality and efficiency.

DBRX Instruct

DBRX also comes in a specialized version designed for instruction-following tasks, known as DBRX Instruct. This variant shares the MoE architecture, utilizing a targeted training approach to excel in applications requiring brief interactions.

Evaluation

DBRX and its instruction-following variant were rigorously evaluated against both open source and commercial models, showcasing superior performance across a range of metrics, including general knowledge, commonsense reasoning, and specialized domains such as programming and mathematics.

The model demonstrates remarkable proficiency in handling long-context inquiries, providing insights into its capabilities and the potential applications in various fields.

RAG is another area in which DBRX excels

Using DBRX

DBRX and DBRX Instruct are accessible for implementation via the HuggingFace platform, ensuring a straightforward integration process for users. The models require significant memory for operation but promise a powerful toolset for addressing complex language understanding and generation tasks, as demonstrated through practical examples.

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tokenizer = AutoTokenizer.from_pretrained("databricks/dbrx-instruct", trust_remote_code=True, token="hf_YOUR_TOKEN")
model = AutoModelForCausalLM.from_pretrained("databricks/dbrx-instruct", device_map="cpu", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN")

input_text = "What does it take to build a great LLM?"
messages = [{"role": "user", "content": input_text}]
input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt")

outputs = model.generate(**input_ids, max_new_tokens=200)
print(tokenizer.decode(outputs[0]))

DBRX represents a major upgrade in Databricks’ LLM stack. Together with their enterprise distribution, it can become one of the most important open source LLM models in the new wave of generative AI.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Inside DBRX: Databricks’ Impressive Open Source LLM

Author(s): Jesus Rodriguez

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Architecture

Training

Inference

DBRX Instruct

Evaluation

Using DBRX

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

7 Counterintuitive and Non-intuitive Probability Problems

TAI 134: The US Reveals Its New Regulations for the Diffusion of Advanced AI

Multi-Agent AI: From Isolated Agents to Cooperative Ecosystems

Inside rStar-Math, a Technique that Makes Small Models Math GPT-o1 in Math Reasoning

Multi-Class Classification VS Multi-Label Classification

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Inside DBRX: Databricks’ Impressive Open Source LLM

Author(s): Jesus Rodriguez

TheSequence U+007C Jesus Rodriguez U+007C Substack

The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…

Architecture

Training

Inference

DBRX Instruct

Evaluation

Using DBRX

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement