Inside Orca 2: Microsoft New Method to Teach Reasoning to Small Language Models
Last Updated on December 11, 2023 by Editorial Team
Author(s): Jesus Rodriguez
Originally published on Towards AI.
I recently started an AI-focused educational newsletter, that already has over 160,000 subscribers. TheSequence is a no-BS (meaning no hype, no news, etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers, and concepts. Please give it a try by subscribing below:
TheSequence U+007C Jesus Rodriguez U+007C Substack
The best source to stay up-to-date with the developments in the machine learning, artificial intelligence, and dataβ¦
thesequence.substack.com
Earlier this year, Microsoft Research unveiled Orca, a 13-billion parameter model that can emulate the intricate reasoning processes exhibited by other LLMs. Specifically , Orca learns from GPT-4 signals including explanatory traces, meticulous step-by-step thinking, and a myriad of complex instructions. Just a few days ago, Microsoft expanded on that line of work with the release of Orca 2, an extension of the groundbreaking work that delves even deeper into the domain of Small Language Models (SLMs). This new release challenges the conventional approaches to reasoning, pushing the boundaries of whatβs possible in the field.
Traditionally, the training of SLMs has leaned heavily on imitation learning, striving to replicate the output of their more illustrious counterparts. However, Microsoft Research posits that this unrelenting emphasis on imitation may inadvertently constrain the potential of these smaller models. The goal here is to empower small LMs to employ diverse solution strategies for various tasks, ones that may diverge from the routes taken by their larger counterparts.
At the heart of Orca 2 lie two pivotal techniques:
i. Instruction Tuning: This is a recent concept that has gained prominence in the LLM space. This technique involves learning from input-output pairs, where the input comprises natural language task descriptions, and the output demonstrates the desired behavior. The efficacy of instruction tuning has been demonstrated in enhancing a modelβs ability to follow instructions across both familiar and unfamiliar tasks, elevating the overall quality of generated content, and furnishing models with enhanced zero-shot capabilities and advanced reasoning skills.
ii. Explanation Tuning: While instruction tuning is very efficient, it has its limitations. Notably, it can lead to models generating outputs that are stylistically sound but factually erroneous. For instance, instruction-tuning towards overly concise targets may deprive the student model of a deep understanding of complex reasoning processes, thereby limiting its ability to generalize across diverse tasks. To address this concern, Orca 1 introduced Explanation Tuning, a novel approach aimed at training student models using richer and more expressive reasoning signals. This involves crafting system instructions that prompt the teacher model to provide detailed explanations while navigating a task. These system instructions serve as high-level guidelines that LLMs must adhere to as they interact with individual user prompts, and they are distinct from user-initiated dialogues thanks to a βsystemβ role flag in the ChatML interface.
Microsoft combines these two techniques in Orca 2 to achieve a type of reasoning that seems to be highly efficient in SLMS.
Orca 2 and Cautious Reasoning
Cautious Reasoning refers to the process of determining the most suitable solution strategy for a given task. This selection process encompasses a spectrum of options, ranging from straightforward, direct answer generation to the utilization of more contemplative βSlow Thinkingβ strategies such as step-by-step reasoning, guess and check, or explain-then-answer, among others. The following elucidates the methodology behind training a Cautious Reasoning Language Model (LLM):
1) Commence with a diverse collection of tasks, representing a cross-section of challenges.
2) Informed by the insights garnered from Orcaβs performance, make informed decisions about which tasks necessitate specific solution strategies, be it direct-answer, step-by-step, explain-then-answer, or others.
3) Craft task-specific system instructions tailored to the selected strategy, enabling the acquisition of teacher responses for each task.
4) During the training phase, employ a process known as βPrompt Erasing,β where the studentβs system instruction is substituted with a generic one devoid of task-specific details, emphasizing the modelβs autonomous learning.
The cautious reasoning process is clearly illlustrated in the following dialog which shows how the student model learns the strategy without starting off with specific instructions.
To train Orca 2, Microsoft built a brand-new dataset, boasting approximately 817,000 training instances/ Building upon the foundation laid by Orca 1, Orca 2 underwent progressive learning, drawing data subsets from a fusion of the original FLAN annotations, the Orca 1 dataset, and the newly minted Orca 2 dataset. The bedrock of this training dataset remains FLAN, enriched with mathematical challenges and a collection of few-shot examples.
The core of the Orca 2 training relies on a technique known as progress learning which hinges on initiating training with either the LLaMA-2β7B or LLaMA-2β13B checkpoint, followed by fine-tuning on the train split of the FLAN-v2 dataset for a single epoch. Itβs noteworthy that the FLAN-v2 dataset encompasses both zero-shot and few-shot problems. Subsequently, the model underwent training on 5 million ChatGPT data instances from Orca 1, spanning three epochs. The final leg of training encompassed a four-epoch session on a composite dataset, consisting of 1 million GPT-4 data instances from both Orca 1 and Orca 2βs 817,000 data samples.
Evaluation
The litmus test for Orca 2βs prowess came in the form of a comprehensive evaluation conducted by Microsoft. This evaluation spanned a wide array of benchmarks, ranging from advanced capabilities like reasoning to fundamental tasks such as text completion, as well as grounding, truthfulness, and safety.
The work on Orca 2 highlights the possibilities of enhancing the reasoning capabilities of SLMs. Through specialized training on synthetic data, Orca 2 models have demonstrated not only the feasibility but also the attainment of improved performance levels. By leveraging an array of reasoning techniques and astutely identifying the most effective solution strategy for each task, these models have showcased prowess that often matches or surpasses much larger counterparts, particularly in the realm of zero-shot reasoning tasks. While acknowledging the existence of inherent limitations and constraints linked to their foundational models, Orca 2 models present a hopeful prospect for future enhancements, particularly in terms of bolstered reasoning capabilities, control, and safety, all thanks to the strategic application of synthetic data in post-training refinement.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI