Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Are You Using the OpenAI API Correctly?
Latest   Machine Learning

Are You Using the OpenAI API Correctly?

Author(s): Tom or

Originally published on Towards AI.

Background

OpenAI API has become a standard in the way we communicate with Large Language Models (LLM). Many open-source projects (Ollama, llamacpp) and even enterprise SDKs (Google vertex , Anthropic) provide OpenAI-compatible API. This unification allows changing between LLM providers by simply replacing a line.

Are You Using the OpenAI API Correctly?
Source: Image by the author. Using the chat completion endpoint in Python

Most language frameworks (Langchain, LangGraph) build on this fact to bring a “provider-agnostic switch”. They provide simple wrappers over the OpenAI client for cleaner usage.

API Spec

The API exposes 2 main communication channels:

  • Chat completion
  • Completion

The Chat completion takes in a structured list of objects, specifying the role and the content. On the other hand, the “regular” completion takes in a string and returns one. So what is the difference?

Source: Image by the author. OpenAI-compatible client with LangChain. Usage of chat completion and completion.

Quick refresher on LLM inner workings

To better understand the differences between the endpoints, we need to understand how LLM understand and generate text.

Given a string S, the LLM split S into units called “tokens”. This process is called tokenization and is performed by a tokenizer. The LLM process the received token sequence from the tokenizer, and generate the next probable token. The new generated token is appended to the previous ones, and the process repeats until a special token indicating “end of generation” is generated.

If the LLM is just the next probable token machine — how does it know to chat with us? how does it knows what is the system prompt? how can we instruct it?

LLM are trained on multiple stages. The first stage is usually referred to as “pre-training”, which is the act of training the model on diverse, large datasets (Code repositories, Wikipedia, private datasets, ….). The result of that model is referred to as the “base model”

The second step, referred to as “post-training”, is different fine-tuning processes being done on the “base model”. Popular examples to that are “chat” fine-tuning and “instruct” fine-tuning. In this step, many practices introduce new special tokens

Example — LLaMa fine tuning

Meta released state-of-the-art open-source models until about two years ago. (Ages in AI time…) . We will focus on 2 (old) models:

The none-chat version is a pure token completion machine.If we ask, “Who are you?”, the completion is… well… weird:

Source: Image by the author. Llama2–70b completion from together.ai playground

On the other hand, the chat version “knows” to differentiate between requests (or instructions) and “understands” that you ask a question.

How the transformation works (base model -> chat model)?

The base model is further trained on a curated, relatively small dataset of “chat” or “instruct” data. The architecture of the model is still the same — the model still produces token by token, however, the input is not the raw string you send to it. The string, or the messages you send to it are being formatted in special template, defined by the fine-tuning process and the tokenizer.

For example, the query “Who are you?” for LLaMa-2–70-b-chat will be transformed to something like this:

Source: Image by the author.

[INST], <<SYS>> and additional tokens were added, and the model sees that string and its input. But who does the formatting? The chat model will not “crash” if we don’t use the template.

Chat completion vs Completion

When using chat completion, the hosting server does (or should) the formatting automatically for you. (Pro-tip: make sure the server does it properly)

Source: Image by the author. Usage of chat client and the resulting input to the model

When using the completion endpoint, the model receives the input as-is.

Usage of regular completion and the resulting input to the model

So for the second example, we will get subpar or even gibberish results, due to not following the chat template.

The caveat

The completion API came before the chat completion, as chat models are an incremental step over the base models. Due to that, many applications, tutorials (and data that LLM trained on) are based on the completion API. In the present, the releases models are usually chat or instruct based, so they have some chat template to interact with. Many (popular) libraries use the completion endpoint as default — so watch out.

So make sure that if you are using chat/instruct model, you are using the chat completion endpoint, or you are doing the chat formatting at your end, and using the completion endpoint.

Why “Completion” lingers

The completion endpoint could mess your GenAI application and you wouldn’t even know it. So why it is still here?

The main reason is back-compatibility — as many application relies on that and will break without it.

However, the completion endpoint provide flexibility that is not achieved via the chat endpoint. Let’s consider this example:

You are developing a chatbot for a French company. The chatbot should only answer in French. The go-to solution to answer only in French is to include it in the system prompt. This will probably work well for most cases, however when a user will ask something in English, the system may break and will answer in English (weaker models especially ).

Source: Image by the author.

To enforce a more robust solution, one can leverage the nature of LLM as probabilistic token machine. By its nature — if the first token of the generated answer will be “French” token, it is very likely that the rest of the answer will also be in French. We can “trick” the model by letting it “think” that he already generated a single token in French. This can only be done only with the completion endpoint:

Source: Image by the author. Added suffix of “La réponse est :” which is Google translate of “The answer is” in French

Takeaways

  • The chat endpoint is a high-level convenience wrapper; underneath it’s still tokens all the way down.
  • Special tokens (<|/INST|>,<<SYS>>, etc.) govern role boundaries, stops, and tooling hooks.
  • Older code using the completion endpoint can silently lose those tokens unless you add them yourself.

Happy prompting!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.