Inside Code Llama: Meta AI’s Entrance in the Code LLM Space
Last Updated on August 30, 2023 by Editorial Team
Author(s): Jesus Rodriguez
Originally published on Towards AI.
The new family of models builds on the Llama 2 foundation to match state-of-the-art performance across different code generation tasks.
I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence U+007C Jesus Rodriguez U+007C Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and data…
Coding has rapidly become one of the most active theaters of action for the large language models(LLMs). Since OpenAI unveiled Codex( now part of GPT-4) last year, the level of innovation in coding language models has been breathtaking. In the last few months, we have seen code LLMs released by companies like Salesforce, Hugging Face, DeepMind, Amazon and many others. Last week, Meta AI jumped the code LLM frenzy with the release of Code Llama, an open-source code LLM based on the recently released Llama 2. The release of Code Llama is significant given the impact that Meta is having in the open-source foundation model movement
Inside Code Llama
The release of Code Llama does not include a single model but three different variants, characterized by their parameter sizes of 7B, 13B, and 34B. Each of these models has been trained on an extensive pool of 500B tokens encompassing code and code-related information. Notably, the 7B and 13B base and instruct models have been endowed with fill-in-the-middle (FIM) competence, empowering them to seamlessly insert code into existing code structures. This attribute equips them to handle tasks like code completion right from the outset.
The trio of models caters to distinct requisites concerning serving and latency. For instance, the 7B model boasts the ability to operate on a single GPU. While the 34B model stands out for yielding optimal outcomes and elevating coding assistance, the smaller 7B and 13B versions excel in speed, making them fitting for low-latency tasks such as real-time code completion.
Meta AI’s innovations further extend to two nuanced adaptations of Code Llama: Code Llama — Python and Code Llama — Instruct.
- Code Llama — Python is a specialized derivation, meticulously honed on a substantial volume of Python code spanning 100B tokens. Given Python’s central role in code generation benchmarks and its significance within the AI community, this focused model augments utility.
- Code Llama — Instruct represents an alignment and refinement of Code Llama through instructional fine-tuning. This novel training approach entails furnishing the model with “natural language instruction” inputs paired with anticipated outputs. This strategic methodology enhances the model’s capacity to grasp human expectations in prompts. For endeavors involving code generation, it is advised to opt for Code Llama — Instruct versions, as they have been calibrated to yield useful and secure natural language responses.
Deep diving into the Code Llama training and fine-tuning, there are a few aspects that are worth highlighting
Llama’s training rests on a meticulously curated dataset enriched with publicly available code, offering a near-duplicate-free landscape. The dataset consists of 500B tokens during the initial phase, starting from the 7B, 13B, and 34B versions. A supplementary 8% of sample data is garnered from natural language datasets linked to code domains.
Within the realm of Code Infilling, a pivotal task revolves around predicting missing segments within a program while being guided by contextual surroundings. Pragmatic applications encompass code completion within Integrated Development Environments (IDEs), type inference, and even the generation of in-code documentation such as docstrings. Operating in alignment with the concept of causal masking, a framework expounded by Aghajanyan et al. (2022) and Fried et al. (2023), Meta AI molds infilling models. The training process entails shifting parts of training sequences to the conclusion, paving the path for autoregressive predictions. In this endeavor, both the versatile 7B and 13B models undergo infilling-oriented training, echoing the strategies advised by Bavarian et al. (2022).
3) Long Context Fine-Tuning:
Unraveling the intricacies of handling extensive sequences is a formidable pursuit in the realm of transformer-based language models. The pivotal challenges orbit around extrapolation — delving into sequence lengths beyond those encountered during training — and the quadratic complexity of attention passes that tilts the balance towards short-to-medium inputs for effective training. Meta AI steps forward with a unique solution, introducing the dedicated domain of long context fine-tuning (LCFT). Embracing sequences encompassing 16,384 tokens, a substantial leap from the 4,096 tokens featured in Llama 2’s initial code training stages, LCFT empowers models with extended-range capabilities. This strategic shift occurs within a fine-tuning phase, circumventing undue escalation in training costs.
4) Instruction Fine-Tuning:
Code Llama’s prowess extends to instruction fine-tuning, witnessed in the refined Code Llama — Instruct models. This iteration leverages Code Llama as its foundation, sculpted to aptly respond to queries. Merging Supervised Fine-Tuning with an expansive pool of Rejection Sampling examples yields this instructive competence.
In the realm of datasets, Meta AI embarks on a proprietary journey, curating instances tethered to code-related tasks. In recognition of the resource-intensive nature of acquiring data from human annotators or through human feedback, a particular emphasis on self-instruction is embraced. The domain of coding tasks, steeped in the insights of professional developers, forms the canvas on which this innovative approach is painted.
The evaluate Code Llama, Meta AI engaged two widely acknowledged coding benchmarks: HumanEval and Mostly Basic Python Programming (MBPP). The HumanEval benchmark systematically assesses the model’s prowess in code completion via docstrings, while the MBPP benchmark scrutinizes the model’s capacity to translate descriptions into executable code.
The meticulous benchmarking endeavor unfolded illuminating results: Code Llama outshone open-source, code-centric Large Language Models (LLMs) and even outperformed its predecessor, Llama 2. For instance, in the case of Code Llama 34B, remarkable scores emerged — an impressive 53.7% on the HumanEval benchmark and a formidable 56.2% on the MBPP benchmark. These scores stood as the highest amongst comparable state-of-the-art solutions, positioning Code Llama 34B on par with the notable capabilities of ChatGPT.
Code Llama promises to be one of the most important code LLMs in the near future. It certainly contributes to reaffirm the value of open-source foundation models across different domains.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI