Threads in OpenAI Assistants API — In-Depth Hands on

Last Updated on January 6, 2025 by Editorial Team

Author(s): Talib

Originally published on Towards AI.

In this blog, we will explore what chat completion models can and cannot do, and then see how Assistance API addresses those limitations. We will also focus on threads and messages — how to create them, list them, retrieve them, modify them, and delete them. Additionally, we will add some Python code snippets and show possible outputs based on the script language.

Limitations of Chat Completion Models

1.1 No Memory

Chat completion models do not have a memory concept. For example, if you ask: “What’s the capital of Japan?”

The model might say: “The capital of Japan is Tokyo.”

But when you ask again: “Tell me something about that city.”

It often responds with: “I’m sorry but you didn’t specify which city you are referring to.”

It does not understand what was discussed previously. That’s a main issue: there is no memory concept in chat completions.

1.2 Poor at Computational Tasks

Chat completion models are really bad at direct computational tasks. For instance, if you want to reverse the string “openaichatgpt”, it may generate the wrong output, like inserting extra characters or missing some letters.

1.3 No Direct File Handling

In chat completions, there is no way to process text files or Word documents directly. You have to convert those files to text, do chunking (divide documents into smaller chunks), create embeddings, and do vector searches yourself. Only then do you pass some relevant text chunks to the model as context.

1.4 Synchronous Only

Chat completion models are not asynchronous. You must ask a question and wait for it to finish. You cannot do something else while it’s processing without extra workarounds.

2. Capabilities of the Assistance API

2.1 Context Support with Threads

In Assistance API, you can create a thread for each user. A thread is like a chat container where you can add many messages. It persists the conversation, so when the user logs in again, you can pass the same thread ID to retrieve what was discussed previously. This is very helpful.

2.2 Code Interpreter

There is also a code interpreter. Whenever you ask some computational task, it runs Python code. It then uses that answer to expand or explain. This makes it very helpful for reversing strings, finding dates, or any Python-based operations.

2.3 Retrieval with Files

The Assistance API has retrieval support, letting you upload files and ask questions based on those files. The system handles the vector search process, then uses relevant chunks as context. You can upload up to 20 files in Assistance as context. This is very helpful for referencing company documents, reports, or data sets.

2.4 Function Calling

Function calling allows the model to tell you what function to call and what arguments to pass, so that you can get external data (like weather or sales from your own database). It does not call the function automatically; it indicates which function to call and with what parameters, then you handle that externally.

2.5 Asynchronous Workflows

Assistance API is asynchronous. You can run a request, and you don’t have to wait for it immediately. You can keep checking if it’s done after a few seconds. This is very helpful if you have multiple tasks or want to do other things in parallel.

3. Focusing on Threads and Messages

A thread is essentially a container that holds all messages in a conversation. OpenAI recommends creating one thread per user as soon as they start using your product. This thread can store any number of messages, so you do not have to manually manage the context window.

Unlimited Messages: You can add as many user queries and assistant responses as you want.
Automatic Context Handling: The system uses truncation if the conversation grows beyond token limits.
Metadata Storage: You can store additional data in the thread’s metadata (for example, user feedback or premium status).

Below are code snippets to demonstrate how to create, retrieve, modify, and delete threads.

3.1 Creating an Assistant

First, you can create an assistant with instructions and tools. For example

from openai import OpenAI
client = OpenAI()

file_input = client.files.create(file=open("Location/to/the/path", "rb"), purpose = "assistants")

file_input.model_dump()

assistant = client.beta.assistants.create(
 name="data_science_tutor",
 instructions="This assistant is a data science tutor.",
 tools=[{"type":"code_interpreter", {"type":"retrieval"}}],
 model="gpt-4-1106-preview",
 file_ids=[file_input.id]
)
print(assistant.model_dump())

3.2 Creating Threads

A thread is like a container that holds the conversation. We can create one thread per user.

thread = client.beta.threads.create()
print(thread.model_dump())

id: A unique identifier that starts with thr-.
object: Always "thread".
metadata: An empty dictionary by default.

Why Create Separate Threads? OpenAI recommends creating one thread per user as soon as they start using your product. This structure ensures that the conversation context remains isolated for each user.

3.3 Retrieve a Thread

retrieved_thread = client.beta.threads.retrieve(thread_id=thread.id)
print(retrieved_thread.model_dump())

This returns a JSON object similar to what you get when you create a thread, including the id, object, and metadata fields.

Modify a Thread

You can update the thread’s metadata to store important flags or notes relevant to your application. For instance, you might track if the user is premium or if the conversation has been reviewed by a manager.

updated_thread = client.beta.threads.update(
 thread_id=thread.id,
 metadata={"modified_today": True, "user_is_premium": True}
)
print(updated_thread.model_dump())

modified_today: A custom Boolean to note whether you changed the thread today.
user_is_premium: A Boolean flag for user account tier.
conversation_topic: A string that labels this thread’s main subject.

Further Metadata Examples

{"language_preference": "English"} – If the user prefers answers in English or another language.
{"escalated": true} – If the thread needs special attention from a support team.
{"feedback_rating": 4.5} – If you collect a rating for the conversation.

Delete a Thread

When you no longer need a thread, or if a user deletes their account, you can remove the entire conversation container:

delete_response = client.beta.threads.delete(thread_id=thread.id)
print(delete_response.model_dump())

Once deleted, you can no longer retrieve this thread or any messages it contained.

4. Working with Messages

Previously, we focused on threads — the containers that hold conversations in the Assistance API. Now, let’s explore messages, which are the individual pieces of content (questions, responses, or system notes) you add to a thread. We’ll walk through creating messages, attaching files, listing and retrieving messages, and updating message metadata. We’ll also show Python code snippets illustrating these steps.

Messages and Their Role in Threads

What Are Messages? Messages are mostly text (like user queries or assistant answers), but they can also include file references. Each thread can have many messages, and every message is stored with an ID, a role (for example, "user" or "assistant"), optional file attachments, and other metadata.
Opposite Index Order: Unlike chat completions where the first message in the list is the earliest, here the first message you see in the array is actually the most recent. So, index 0 corresponds to the newest message in the thread.
Annotations and File Attachments: Messages can include annotations — for instance, if a retrieval step references certain files. When using a code interpreter, any new files generated may also appear as part of the message annotations.

Create a Message in a Thread

Messages are added to a thread. Each message can be a user message or an assistant message. Messages can also contain file references.

Before adding messages, we need a thread. If you do not already have one:

# Create a new thread
new_thread = client.beta.threads.create()
print(thread.model_dump()) # Shows the thread's detailspython

# Create a new message in the thread
message = client.beta.threads.messages.create(
 thread_id=thread.id, 
 role="user",
 content="ELI5: What is a neural network?",
 file_ids=[file_input.id] # Passing one or more file IDs
)
print(message.model_dump())

Here, you can see:

Message ID: Unique identifier starting with msg.
Role: user, indicating this is a user input.
File Attachments: The file_ids list includes any referenced files.
Annotations: Empty at creation, but can include details like file citations if retrieval is involved.
Metadata: A placeholder for storing additional key-value pairs.

List Messages in a Thread

To list messages in a thread, use the list method. The limit parameter determines how many recent messages to retrieve.

messages_list = client.beta.threads.messages.list(
 thread_id=thread.id, 
 limit=5
)
for msg in messages_list.data:
 print(msg.id, msg.content)

Now let’s try to list all the messages:

You will see only the most recent messages. For instance, if we’ve added just one message, the output will look like:

If there are multiple messages, the system works like a linked list:

The first ID points to the newest message.
The last ID points to the earliest message.

Retrieve a Specific Message

retrieved_msg = client.beta.threads.messages.retrieve(
 thread_id=new_thread.id,
 message_id=message.id
)
print(retrieved_msg.model_dump())

Retrieve Message Files

Now let’s retrieve message file:

This provides the file’s metadata, including its creation timestamp.

files_in_msg = client.beta.threads.messages.files.list(
 thread_id=new_thread.id,
 message_id=message.id
)
print(files_in_msg.model_dump())

Modify a Message

updated_msg = client.beta.threads.messages.update(
 thread_id=new_thread.id,
 message_id=message.id,
 metadata={"added_note": "Revised content"}
)
print(updated_msg.model_dump())

Delete a Message

deleted_msg = client.beta.threads.messages.delete(
 thread_id=new_thread.id,
 message_id=message.id
)
print(deleted_msg.model_dump())

We have seen that chat completion models have no memory concept, are bad at computational tasks, cannot process files directly, and are not asynchronous. Meanwhile, Assistance API has context support with threads, code interpreter for computational tasks, retrieval for files, function calling for external data, and it also supports asynchronous usage.

In this blog, we focused on how to create, list, retrieve, modify, and delete threads and messages. We also saw how to handle file references within messages. In the next session, we will learn more about runs, which connect threads and assistants to get actual outputs from the model.

I hope this is helpful.

Thank you for reading!

Let’s connect on LinkedIn!

You might be interested in Reading!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication