A Guide to Integrating LLM Agent into POS Systems
Last Updated on April 22, 2024 by Editorial Team
Author(s): Daniel Khoa Le
Originally published on Towards AI.
Overview:
🫖 TLDR
This article presents an LLM-powered solution to improve the efficiency of ordering and payment processes in food establishments. It involves the integration of an LLM Agent (built with LangChain) which is capable of interacting with API endpoints (e.g. the POS System). This article walks through the codes to build such an agent with practical examples. The codes are published here. You can also interact with the agent here.
Demo Video: Notice the signposting/cues agent
β¦ please
(read more below) in this conversation and how the agent picked only the relevant information to make respective API requests.
🥖 The Story
🧀 The Inspiration
I am a fan of the tasty KΓ€sedings at Zeit fΓΌr Brot (Berlin, Germany) and Iβve spent quite a bit of time hanging out at their busy Eberswalder spot. Itβs where my interest in the tasty breads collides with the reality of waiting in line.
I noticed and hypothesized that, to maintain hygiene, cashiers donβt directly handle the food.
There would be one cashier and one other staff working in a pair to deal with customers. The cashiers only manage orders and payments, leaving food handling to their peers.
➡οΈ This observation sparked an idea: What if we could make the ordering and payment process more efficient, enhancing customer satisfaction, streamlining staff workflow, and ultimately boosting Zeit fΓΌr Brotβs revenue (sorry for the buzzwords)?
💡 The Solution: The Integration of an LLM-agent
This solution proposes integrating a custom LLM agent into the Point of Sale (POS) system. The agent will translate the staffβs orders into API requests, registering items in the POS system without manual entry.
💡 Workflow Simplified:
👩β💼 Staff-Customer Interaction: A staff member receives an order from a customer. He communicates with the LLM agent through his headset. The staff member can engage in conversation with the customer without needing to deactivate the headset, allowing for a seamless interaction.
The LLM agent listens to the entire conversation, filtering out irrelevant chatter and focusing on the important details.
This is facilitated by the use of specific βagentβ and βpleaseβ cues (signposting), which enables the LLM agent to distinguish between casual conversation and a direct request for assistance.
Example:
– Customer A: Do you want to get an espresso and two cinnamon rolls?
– Customer B: I donβt know.
– Customer A: OK, Iβll decide then. Iβll get a baguette, a cappuccino, and two cheesecakes.
– Staff: Agent, add a baguette, a cappuccino, and two cheesecakes, please.
– Customer A: Sorry I changed my mind, I wonβt get the baguette.
– Staff: Agent, remove the baguette, please.
– Staff: Agent, proceed to payment by card, please.
Under the hood:
- The agent processes this speech, generating API request(s) to register items to the order.
- If thereβs a change in the order, the staff can proceed with: βAgent, remove one cinnamon roll.β The agent will adjust the order accordingly.
- Upon finalizing the order, βAgent, proceed to payment with QR codes / cards pleaseβ prompts the agent to initiate the payment process, creating a smooth end-to-end transaction.
🔌 The Codes
In this session, I will focus on discussing how to build a custom LLM agent with LangChain. I assume you are familiar to a certain extent with LangChainβs terms and usage. If not, you can checkout their documentation here.
I suggest you also refer to my repository when reading the following, as I will not display the entire scripts in this article. Instead, I will highlight the key takeaways for you.
What do you need to create a custom LLM agent that can interact with external systems?
- The tools (that interact with external systems)
- The prompt (that guides the agent)
- The agent (and its executor)
Now, letβs go through it one by one.
🔧 The tools
I created 3 simple tools to
- add item(s) to an order
- remove item(s) from an order
- proceed to payment with a preferred method
These tools are simple functions created with the langchain @tool
decorators.
You might wonder how the agent knows which tools to take. The answer is that it bases its decisions on the name and the description of the tools.
In the following codes:
- The functionβs name
add_orders
will be considered as the toolβs name. The functionβs docstring is treated as the toolβs descriptions. The description of the tool plays an important role because it helps the LLM to reason and decide which tools are relevant in a given context. - This
add_orders
function take an argumentorder_data
in the format of a dictionary (see codes below). The LLM will make sure that theorder_data
is in the format that we want. You can see that I wrote the docstring as if I am giving instructions to the LLM. - The function makes a POST request to an API endpoint. This, ideally, is the system that we want the agent to control. In my case, it is the POS system that the staff uses at the store.
@tool
def add_orders(order_data):
"""Add items to an order by making a POST request with the order_data argument strictly following this format:
order_data = [
{'quantity': 2, 'item': 'apple'},
{'quantity': 1, 'item': 'banana'},
{'quantity': 3, 'item': 'cherry'}
]
"""
# Make a POST request to the FastAPI endpoint
url = "http://localhost:8889/add"
# Adhere to the expected format of the FastAPI endpoint
request_json = {"list_items": order_data}
response = requests.post(url,
headers={
"Content-Type": "application/json"},
json=request_json)
As mentioned previously, besides this add_orders
, I have remove_orders
and pay_orders
functions as well. They form a list of tools
at the disposal of the agent. Here is how you do it:
# Create an LLM model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# Create a list of tools for the LLM to pick
tools = [add_orders, remove_orders, pay_orders]
# Bind the tools to the LLM, the result is an LLM with tools
llm_with_tools = llm.bind_tools(tools)
💬 The prompt
I created a simple prompt with ChatPromptTemplate
.
- It starts with a
system
role, depicting the tasks that the agent needs to perform. This will govern the behavior of the agent. I think most of βthe artβ belongs to this part. - Then the
user
role comes in, with a place holder of{input}
for us to pass the userβs βpromptβ in. For example: βAgent please add a baguette to my order please.β - Finally, the
MessagesPlaceholder
is required so that the agent can store intermediate information about how it reasons and what steps have been taken. You do not need to worry about this one.
prompt = ChatPromptTemplate.from_messages(
[
("system",
"You are a helpful assistant, you extract user's food orders from a conversation and add or remove them to POS system by making POST request with the extracted order data.\\
User can also request to pay with a preferred payment method.\\
The real order information starts from a signpost word 'agent' and ends with a signpost word 'please'.\\
The rest of the conversation is not important, you must strictly ignore it. The extracted order should contain the items and their quantity,\\
and it is in the format of a python list of multiple dictionaries."),
("user",
"{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)
🤖 The agent
We form the agent by chaining all the langchain
components together, including the before-mentioned prompt and the LLM model with its toolset.
You might wonder why we need to construct an input dictionary firsthand. It is because the agent
will be called many times by its agent_executor
after our invocation; and the agent_executor
needs to pass on inputs that in the correct format to the agent
. You see my point when you remove this step, langchain
will tell you that it does not expect intermediate_steps
in the input (see above where we constructed the prompt
with agent_scratchpad
not intermediate_steps
).
agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
# And we need to create an agent executor, we cannot just call the agent directly
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_output = agent_executor.invoke(
{"input": conversation}
)
Letβs discuss a little bit about the flow of the AgentExecutor
. The AgentExecutor
will carry out the following steps in sequence:
- Format the input
- Pass the output of the last step to the prompt
- Pass the output of the last step to the LLM agent with its tools (a tool might be called here)
- Parse the output of the last step and use it for further reasonings (e.g. maybe
AgentExecutor
will call another tool here)
Letβs try to invoke the AgentExecutor
we just built. You can see (in the LangSmith log below) that after calling add_orders
, the AgentExecutor
can then call pay_orders
as well.
agent_executor.invoke(
{"input": "Agent please add a baguette please.\
Agent please proceed to payment by card"})
✍οΈ Conclusion
Congratulations on reaching the conclusion of this tutorial!
Up to now, we have created a custom LLM agent with Langchain from the 3 components: the tools, the prompt, and the agent.
This LLM agent can reason and interact with external systems (APIs).
Now, itβs up to your use cases that you can tailor these interactions by changing the tools and the prompt.
As I tried to keep the article succinct, I suggest you clone my repository and execute the complete script on your own.
➡οΈ You can find the repository here.
➡οΈ οΈThe live demo is available here.
If you have any questions, feel free to reach out to me on LinkedIn.
About me
I am Daniel Le, based in Berlin. I currently work as a Data Engineer β with a great passion for Machine Learning.
I am based in Berlin, Germany and I am interested in new technologies and how they can be implemented to solve real-world problems.
Should you have any inquiries or wish to discuss these interests further, please do not hesitate to connect with me on LinkedIn.
Reference
- https://python.langchain.com/docs/modules/agents/how_to/custom_agent
- https://python.langchain.com/docs/modules/agents/tools/custom_tools
- https://python.langchain.com/docs/modules/agents/tools/tools_as_openai_functions
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI