Towards AI Can Help your Team Adopt AI: Corporate Training, Consulting, and Talent Solutions.


Gorilla: Everything You Need to Know
Latest   Machine Learning

Gorilla: Everything You Need to Know

Last Updated on July 25, 2023 by Editorial Team

Author(s): Muhammad Arham

Originally published on Towards AI.

This article introduces Gorilla; UC Berkeley, and Microsoft’s API support for Large Language Models.

Image by Author


LLMs suffer from outdated information and they require re-training to keep up-to-date with recent changes. With limited context and weights, LLMs cannot store data for accurate responses. Therefore, LLMs are augmented by the use of numerous tools and plugins that use external APIs for better answers.

Gorilla introduces self-instruct fine-tuning and retrieval training on a large corpus of APIs that provides better results than leading LLMs, including but not limited to ChatGPT4 and Claude.


The authors created APIBench, an exhaustive set of API corpus, scraped from the three major model hubs.

  • TorchHub: 94 API calls
  • TensorHub:696 API calls
  • HuggingFace: Chose only the 20 most downloaded models totaling 925 API calls.

Then using Self-Instruct, the authors used GPT 4 to generate realistic prompts that use the APIs and constructed 10 different instruction-API pairs to finetune LLMs.


Gorilla is a LLaMA-7B fine-tuned model on the APIBench dataset. The authors use a chat-style training where each API call is used, similar to a real-world conversation between a user and a chatbot. The LLaMA model is then finetuned with and without a retriever.

API Call with Constraints

Consider the following prompt:

Invoke an image classification model that uses less than 10M parameters, but maintains an ImageNet accuracy of at least 70%

LLMs face issues when generating responses for such prompts, as the user adds constraints on an API call and its associated parameters.

Retreiver-Aware Training

In certain scenarios, the model is constrained by a specific API call or documentation that can degrade responses provided by the LLMs. Consider the following prompt:

“Use this API documentation for reference: <retrieved_API_doc_JSON>”

During finetuning, Gorilla aims to parse the second half to fetch the required API specification to answer the first part of the question.


Gorilla can be used in two different modes, similar to the two methods mentioned above.

A user provides a prompt in natural language, and it is parsed by Gorilla in two ways. Zero-shot passes the same prompt with no additions to finetuned Gorilla LLM, which returns an API response. In Retrieval-mode, Gorilla uses GPT-Index or BM25 to first retrieve the most appropriate API documentation, which is added to the user prompt. The message Use this API documentation for reference: is concatenated with the user prompt before sending it to Gorilla LLM.

The methodology is summarized by the following image:

Image from Paper

The authors used the large API data corpus to generate 16450 prompts, and response pairs generated using self-instruct. These were used to finetune the Gorilla LLM. At inference time, the image highlights the two inference methods, Zero-shot, and Information retrieval methods. When using zero-shot, the prompt is passed as is to the Gorilla LLM without any preprocessing. When using the information retrieval method, the natural language prompt is sent to a retriever that compares the user prompt with an existing API database. Relevant API documentation is returned and concatenated with the user prompt before sending it to Gorilla LLM.


The image from the paper highlights the improvements of Gorilla over other LLM models.

Image from Paper

The paper summarizes the results of evaluating Gorilla on the collected dataset with different retrieval methods. The model performs better than other LLMs when queried with API-specific prompts. The graphical illustration highlights such results:

Image from Paper

Benefits of Gorilla LLM:

Reduced Hallucinations

Hallucinations refer to code generated with the wrong API or an API that does not exist. This can lead to runtime errors and non-functional code. Gorilla is specifically finetuned on a large corpus of API calls reducing the chances of hallucination errors. An example is provided in the results section of this article.

Improved Code Generation

With more structured responses that use the most up-to-date API documentation fetched at runtime, the responses are better suited for code generation. This can save a lot of cost and time in debugging generated code, as the responses are dependable.


The code is open-source and can be accessed on GitHub. For inference, the Gorilla can be accessed using the provided Colab notebook or accessed with a Command Line Interface through code.

First, create a new environment and install all dependencies.

conda create -n gorilla python=3.10
conda activate gorilla
pip install -r requirements.txt

Finetuned delta weights for Gorilla are available on HuggingFace. Original LLaMA weights are available here. After downloading weights, apply the weights to the original LLaMA model using the following code:

- base-model-path path/to/hf_llama/
- target-model-path path/to/gorilla-7b-hf-v0
- delta-path path/to/models - gorilla-llm - gorilla-7b-hf-delta-v0

The model can then be run with the given command:

python3 serve/ - model-path path/to/gorilla-7b-{hf,th,tf}-v0


The article introduced Gorilla LLM, the LLaMA model specifically finetuned for API calls. It provides better responses than GPT-4 on all 3 datasets collected and is better suited to adapt to run-time API usage changes.

The correct usage of Gorilla can improve the performance of LLMs using a wide variety of tools and plugins.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓