Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!


Few shots at a Math assistant with Orca-2-7B
Artificial Intelligence   Latest   Machine Learning

Few shots at a Math assistant with Orca-2-7B

Author(s): Kundan Joshi

Originally published on Towards AI.

In the world of text generation with LLMs, where content with largely subjective underpinnings is churned out with imaginative opulence, topics with an analytical quotient, like logical reasoning and maths problems, do not score particularly well.
And yet, this domain remains a topic of universal interest and application due to an ever-existing need from educational practitioners.

With the release of models like Orca 2, and the latter showcased as a Small Language Model (SLM), there is another step forward in this specific direction. SLMs like Orca mimic the reasoning process of larger AI models despite exhibiting a lesser number of parameters than conventional models. They are typically trained by using a corpus of synthetic datasets that are derived from the output of large models like Llama2. This teaches the smaller one a step-by-step thought process, enabling it to better understand the context of logical problems and reasoning through it while responding coherently and explanatorily.
Moreover, Orca 2 has been specially trained with content focused on objective reasoning, like math problems, instead of a multi-turn conversational focus like chat, so it speaks more appropriately to these types of use cases.

To illustrate some actual code in a Colab/GPU-accessible environment:

Let’s begin by loading transformers and associated libraries to enable quantization. Bits andbytes allow easy integration into HuggingFace.

!pip install git+
!pip install accelerate -qq
!pip install SentencePiece -qq
!pip install protobuf -qq
!pip install bitsandbytes -qq

followed by

import torch
import transformers

if torch.cuda.is_available():

Load Orca2 with 7B parameter using the ever resourceful AutoModel-for-pre trained of Huggingface.

from transformers import BitsAndBytesConfig

bnb_config = BitsAndBytesConfig(

model = transformers.AutoModelForCausalLM.from_pretrained(

tokenizer = transformers.AutoTokenizer.from_pretrained(

Now, let's ease into the conversation with prompt templating — prompting specific to Orca.

system_message = """You are Orca, an AI language model created by Microsoft. You are a cautious assistant.

Analyse the maths or logical question given to you and solve it in a step by step manner

Let’s begin with the easy stuff¹…

user_message = "how many ways can I arrange 10 men in a row?"
prompt = f"<U+007Cim_startU+007C>system\n{system_message}<U+007Cim_endU+007C>\n<U+007Cim_startU+007C>user\n{user_message}<U+007Cim_endU+007C>\n<U+007Cim_startU+007C>assistant"
inputs = tokenizer(prompt, return_tensors='pt')

The basics of prompt structuring have been skimped over for now, as reasoning ability is at play here….so let’s check this out with the inference.

from transformers import GenerationConfig

generation_config = GenerationConfig.from_pretrained("microsoft/Orca-2-7b")
generation_config.temperature = 0.1
generation_config.do_sample = True
generation_config.top_p = 0.9

output_ids = model.generate(inputs["input_ids"],generation_config)
answer = tokenizer.batch_decode(output_ids)[0]


Keeping the temperature parameter low, since we have objective content and do not want to let the model meander down a course of creativity.
The output is:

To find the number of ways to arrange 10 men in a row, we can use the concept of permutations. A permutation is an arrangement of objects in a specific order. The formula for finding the number of permutations of n distinct objects is n! (n factorial), which means multiplying all positive integers from 1 to n together.

In this case, we have 10 men, so n = 10. Therefore, the number of ways to arrange them in a row is:

10! = 10 × 9 × 8 × 7 × 6 × 5 × 4 × 3 × 2 × 1 = 3,628,800

So there are 3,628,800 different ways to arrange the 10 men in a row.</s>

Bingo! the answer, as any primary grade maths student would validate.

More of that where it came from –

user_message = "34 men can complete a piece of work in 12 days. In how many days can 51 men complete the same piece of work?"

And the answer we get is an elaborate:

- 34 men can complete a piece of work in 12 days
- 51 men need to complete the same piece of work

Goal: find the number of days needed by 51 men to complete the work

- Assume that the work done by one man in one day is constant
- Let x be the number of days needed by 51 men to complete the work
- Then we can write the following equation:

34 * 12 = 51 * x

- This means that the total work done by 34 men in 12 days is equal to the total work done by 51 men in x days
- To solve for x, we can divide both sides of the equation by 51:

(34 * 12) / 51 = x

- Simplify the fraction by canceling out the common factors:

(34 / 51) * 12 = x

- Evaluate the expression:

(34 / 51) * 12 = (34 * 12) / 51 = 408 / 51 = 8

- Therefore, x = 8

### Final answer: 51 men can complete the same piece of work in 8 days</s>

Pretty nifty, eh? Trying with something else, like probability, also gives out a quick response.

But let’s take something more nuanced and practically useful.

user_message = "Find the compound interest on Rs. 10,000 in 2 years at 4% per annum, the interest being compounded half-yearly."

The answers here, as a few attempts showed, had widely varying reasoning (if you could call it that) flows, not to mention basic errors in maths calculations that stumbled at several points – from converting tenure to deciding operator precedence. It eventually dished out answers that were grossly off the mark.

Evidently, multiple-step maths calculations were doing numbers on the model!
Let’s retry this with a bit of “prompt engineering”.
Returning to our system message, we will redraft our message to give a little nudge to the model.

As will be evident, we employ something along the lines of (COT) Chain of Thought Prompting here, with a more elaborate articulation of the typical “let's think step by step”.

It should be noted that typically COT is believed to yield benefits in much larger models. Still, our case here appeals to some machination to obtain any reasonable interpretation of this particular math problem.

system_message = """You are required to answer Arithmetic Reasoning questions. Follow the steps below:

Step 1: State the formula for solving the question.

Step 2: Variable values like tenure or time "t" may need to be converted as given in below example:
# Example Input #
Time (t) = 1 year and 3 months
# Example converted Output #
Time (t) = 1 year and 3 months = 1 year + (3 /12 ) years = 1 year + 0.25 years = 1.25 years

Step 3: Determine the variable values to substitute into the formula from Step 1 by
- extracting these values from the problem statement.
- get value of variable time (t) from previous Step 2.
State all these variables with names and correct values here.

Step 4. Substitute the corresponding values in the formula carefully and save this as <formula_with_values_substituted> for output at the end.

Step 5: Diligently perform all intermediate calculations. Make sure that arithmetic operator precedence rules
are applied correctly and take particular care that multiplication, division and exponential calculations are accurately done.

Step 6: Compute and state the final answer based on what is asked in the question. Your answer should be in the format:
Desired format:
Answer: <final_answer>
Solved_Formula: <formula_with_values_substituted>\n


Let's think step by step.

And the answer we finally got was:

Step 1: State the formula for solving the question.

The formula for compound interest is:
A = P(1 + r/n)^(nt)

A = the future value of the investment/loan, including interest
P = the principal investment amount (the initial deposit or loan amount)
r = the annual interest rate (decimal)
n = the number of times that interest is compounded per year
t = the number of years the money is invested or borrowed for

Step 2: Convert the given values into the desired format.

Principal (P) = Rs. 10,000
Annual interest rate (r) = 4% = 0.04
Number of times interest is compounded per year (n) = 2 (since it's compounded half-yearly)
Number of years (t) = 2

Step 3: Substitute the given values into the formula.

A = 10000(1 + 0.04/2)^(2*2)

Step 4: Perform the calculations.

A = 10000(1 + 0.02)^4
A = 10000(1.02)^4
A = 10000 * 1.08243216
A = 10824.3216

Step 5: Calculate the compound interest.

Compound Interest (CI) = A - P
CI = 10824.3216 - 10000
CI = Rs. 824.3216

Step 6: State the final answer.

Desired format:
Answer: Rs. 824.3216
Solved_Formula: 10000(1 + 0.04/2)^(2*2)</s>

And that brought a semblance of sanity to the output. The elaborate prompting was arrived at, to address a range of errors that the model made: errors that either got tenure-based fractional conversions wrong or did gaffes around basic multiplication.

Some more questions like “Find the compound interest on Rs 48,000 for one year at 8% per annum when compounded half-yearly” and “Find the compound interest on Rs. 8000 at 15% per annum for 2 years 4 months, compounded annually.” tested the model and the efficacy of the prompt a bit further.

If we have to reflect on the performance, then it would be fair to say that while the explanatory steps that the model dished out are logically coherent for most of the limited choice of examples that were evaluated, the final numerical figures are quite hit and miss.

This is what called for a clean separation of the final answer from the <value-filled-formula> as was presented at the end of the output. We use that to do a quick validation of the answer (since the probability of a correct formula churned out by the model is higher than the actual answer).

# another code cell in our Colab notebook to test a crude formula evaluation
import re
mat ="Solved_Formula:[\S ]*<\/s>", answer)
formula_to_solve = re.sub(r"(Solved_Formula:)U+007C(<\/s>)", "",
formula_to_solve = re.sub(r"\^", "**",formula_to_solve)
formula_to_solve = re.sub(r"(?<=\d)\(", "*(",formula_to_solve)
solved = eval(formula_to_solve)

If I were a teacher, I would take the numerical outputs from the quantized model with some circumspection but would still like to use the explanatory text that accompanied the solutions to the questions.

CPU Deployment

If we need to deploy the model on a laptop, we can use the GGUF (quantized) version of Orca-2. GGUF is the evolution of the llama.cpp initiative from the original GGML and enables model inference on CPU.

Quantized GGUF model files can be downloaded from

Now patch the code together with LangChain to enable future chaining flows to plug in answer validations etc.

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain_community.llms import CTransformers

llm = CTransformers(model="models/orca-2-7b.Q4_K_M.gguf", config={ 'temperature' : 0.1, 'max_new_tokens': 2048, 'context_length': 2200 })
template = """

You are Orca, an AI language model created by Microsoft. You are a cautious assistant.

Analyse the maths or logical question given to you and solve it in a step by step manner


prompt = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.invoke("how many groups of 4 can I make from 16 men?")

And that was that.

FYI, Orca-2 is licensed under the Microsoft Research License, unlike other commercially friendly open-source LLMs.

Photo by Iewek Gnos on Unsplash

Many Thanks to the Orca (in the image above) for making prompt-stoking less painstaking! 🙂

[1]: All maths questions were taken from various publicly available sample question sets.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓