Threads in OpenAI Assistants API — In-Depth Hands on
Last Updated on January 6, 2025 by Editorial Team
Author(s): Talib
Originally published on Towards AI.
In this blog, we will explore what chat completion models can and cannot do, and then see how Assistance API addresses those limitations. We will also focus on threads and messages — how to create them, list them, retrieve them, modify them, and delete them. Additionally, we will add some Python code snippets and show possible outputs based on the script language.
Limitations of Chat Completion Models
1.1 No Memory
Chat completion models do not have a memory concept. For example, if you ask: “What’s the capital of Japan?”
The model might say: “The capital of Japan is Tokyo.”
But when you ask again: “Tell me something about that city.”
It often responds with: “I’m sorry but you didn’t specify which city you are referring to.”
It does not understand what was discussed previously. That’s a main issue: there is no memory concept in chat completions.
1.2 Poor at Computational Tasks
Chat completion models are really bad at direct computational tasks. For instance, if you want to reverse the string “openaichatgpt”, it may generate the wrong output, like inserting extra characters or missing some letters.
1.3 No Direct File Handling
In chat completions, there is no way to process text files or Word documents directly. You have to convert those files to text, do chunking (divide documents into smaller chunks), create embeddings, and do vector searches yourself. Only then do you pass some relevant text chunks to the model as context.
1.4 Synchronous Only
Chat completion models are not asynchronous. You must ask a question and wait for it to finish. You cannot do something else while it’s processing without extra workarounds.
2. Capabilities of the Assistance API
2.1 Context Support with Threads
In Assistance API, you can create a thread for each user. A thread is like a chat container where you can add many messages. It persists the conversation, so when the user logs in again, you can pass the same thread ID to retrieve what was discussed previously. This is very helpful.
2.2 Code Interpreter
There is also a code interpreter. Whenever you ask some computational task, it runs Python code. It then uses that answer to expand or explain. This makes it very helpful for reversing strings, finding dates, or any Python-based operations.
2.3 Retrieval with Files
The Assistance API has retrieval support, letting you upload files and ask questions based on those files. The system handles the vector search process, then uses relevant chunks as context. You can upload up to 20 files in Assistance as context. This is very helpful for referencing company documents, reports, or data sets.
2.4 Function Calling
Function calling allows the model to tell you what function to call and what arguments to pass, so that you can get external data (like weather or sales from your own database). It does not call the function automatically; it indicates which function to call and with what parameters, then you handle that externally.
2.5 Asynchronous Workflows
Assistance API is asynchronous. You can run a request, and you don’t have to wait for it immediately. You can keep checking if it’s done after a few seconds. This is very helpful if you have multiple tasks or want to do other things in parallel.
3. Focusing on Threads and Messages
A thread is essentially a container that holds all messages in a conversation. OpenAI recommends creating one thread per user as soon as they start using your product. This thread can store any number of messages, so you do not have to manually manage the context window.
- Unlimited Messages: You can add as many user queries and assistant responses as you want.
- Automatic Context Handling: The system uses truncation if the conversation grows beyond token limits.
- Metadata Storage: You can store additional data in the thread’s metadata (for example, user feedback or premium status).
Below are code snippets to demonstrate how to create, retrieve, modify, and delete threads.
3.1 Creating an Assistant
First, you can create an assistant with instructions and tools. For example
from openai import OpenAI
client = OpenAI()
file_input = client.files.create(file=open("Location/to/the/path", "rb"), purpose = "assistants")
file_input.model_dump()
assistant = client.beta.assistants.create(
name="data_science_tutor",
instructions="This assistant is a data science tutor.",
tools=[{"type":"code_interpreter", {"type":"retrieval"}}],
model="gpt-4-1106-preview",
file_ids=[file_input.id]
)
print(assistant.model_dump())
3.2 Creating Threads
A thread is like a container that holds the conversation. We can create one thread per user.
thread = client.beta.threads.create()
print(thread.model_dump())
- id: A unique identifier that starts with
thr-
. - object: Always
"thread"
. - metadata: An empty dictionary by default.
Why Create Separate Threads? OpenAI recommends creating one thread per user as soon as they start using your product. This structure ensures that the conversation context remains isolated for each user.
3.3 Retrieve a Thread
retrieved_thread = client.beta.threads.retrieve(thread_id=thread.id)
print(retrieved_thread.model_dump())
This returns a JSON object similar to what you get when you create a thread, including the id, object, and metadata fields.
Modify a Thread
You can update the thread’s metadata to store important flags or notes relevant to your application. For instance, you might track if the user is premium or if the conversation has been reviewed by a manager.
updated_thread = client.beta.threads.update(
thread_id=thread.id,
metadata={"modified_today": True, "user_is_premium": True}
)
print(updated_thread.model_dump())
- modified_today: A custom Boolean to note whether you changed the thread today.
- user_is_premium: A Boolean flag for user account tier.
- conversation_topic: A string that labels this thread’s main subject.
Further Metadata Examples
{"language_preference": "English"}
– If the user prefers answers in English or another language.{"escalated": true}
– If the thread needs special attention from a support team.{"feedback_rating": 4.5}
– If you collect a rating for the conversation.
Delete a Thread
When you no longer need a thread, or if a user deletes their account, you can remove the entire conversation container:
delete_response = client.beta.threads.delete(thread_id=thread.id)
print(delete_response.model_dump())
Once deleted, you can no longer retrieve this thread or any messages it contained.
4. Working with Messages
Previously, we focused on threads — the containers that hold conversations in the Assistance API. Now, let’s explore messages, which are the individual pieces of content (questions, responses, or system notes) you add to a thread. We’ll walk through creating messages, attaching files, listing and retrieving messages, and updating message metadata. We’ll also show Python code snippets illustrating these steps.
Messages and Their Role in Threads
- What Are Messages? Messages are mostly text (like user queries or assistant answers), but they can also include file references. Each thread can have many messages, and every message is stored with an ID, a role (for example,
"user"
or"assistant"
), optional file attachments, and other metadata. - Opposite Index Order: Unlike chat completions where the first message in the list is the earliest, here the first message you see in the array is actually the most recent. So, index 0 corresponds to the newest message in the thread.
- Annotations and File Attachments: Messages can include annotations — for instance, if a retrieval step references certain files. When using a code interpreter, any new files generated may also appear as part of the message annotations.
Create a Message in a Thread
Messages are added to a thread. Each message can be a user message or an assistant message. Messages can also contain file references.
Before adding messages, we need a thread. If you do not already have one:
# Create a new thread
new_thread = client.beta.threads.create()
print(thread.model_dump()) # Shows the thread's detailspython
# Create a new message in the thread
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="ELI5: What is a neural network?",
file_ids=[file_input.id] # Passing one or more file IDs
)
print(message.model_dump())
Here, you can see:
- Message ID: Unique identifier starting with
msg
. - Role:
user
, indicating this is a user input. - File Attachments: The
file_ids
list includes any referenced files. - Annotations: Empty at creation, but can include details like file citations if retrieval is involved.
- Metadata: A placeholder for storing additional key-value pairs.
List Messages in a Thread
To list messages in a thread, use the list
method. The limit parameter determines how many recent messages to retrieve.
messages_list = client.beta.threads.messages.list(
thread_id=thread.id,
limit=5
)
for msg in messages_list.data:
print(msg.id, msg.content)
Now let’s try to list all the messages:
You will see only the most recent messages. For instance, if we’ve added just one message, the output will look like:
If there are multiple messages, the system works like a linked list:
- The first ID points to the newest message.
- The last ID points to the earliest message.
Retrieve a Specific Message
retrieved_msg = client.beta.threads.messages.retrieve(
thread_id=new_thread.id,
message_id=message.id
)
print(retrieved_msg.model_dump())
Retrieve Message Files
Now let’s retrieve message file:
This provides the file’s metadata, including its creation timestamp.
files_in_msg = client.beta.threads.messages.files.list(
thread_id=new_thread.id,
message_id=message.id
)
print(files_in_msg.model_dump())
Modify a Message
updated_msg = client.beta.threads.messages.update(
thread_id=new_thread.id,
message_id=message.id,
metadata={"added_note": "Revised content"}
)
print(updated_msg.model_dump())
Delete a Message
deleted_msg = client.beta.threads.messages.delete(
thread_id=new_thread.id,
message_id=message.id
)
print(deleted_msg.model_dump())
We have seen that chat completion models have no memory concept, are bad at computational tasks, cannot process files directly, and are not asynchronous. Meanwhile, Assistance API has context support with threads, code interpreter for computational tasks, retrieval for files, function calling for external data, and it also supports asynchronous usage.
In this blog, we focused on how to create, list, retrieve, modify, and delete threads and messages. We also saw how to handle file references within messages. In the next session, we will learn more about runs, which connect threads and assistants to get actual outputs from the model.
I hope this is helpful.
Thank you for reading!
You might be interested in Reading!
- Where did multi-agent systems come from?
- Summarising Large Documents with GPT-4o
- How does LlamaIndex compare to LangChain in terms of ease of use for beginners
- Pre-training vs. Fine-tuning [With code implementation]
- Costs of Hosting Open Source LLMs vs Closed Sourced (OpenAI)
- Embeddings: The Back Bone of LLMs
- How to Use a Fine-Tuned Language Model for Summarization
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI