Master LLMs with our FREE course in collaboration with Activeloop & Intel Disruptor Initiative. Join now!

Publication

A Guide to Integrating LLM Agent into POS Systems
Latest   Machine Learning

A Guide to Integrating LLM Agent into POS Systems

Last Updated on April 22, 2024 by Editorial Team

Author(s): Daniel Khoa Le

Originally published on Towards AI.

Custom LLM agent interacting with external systems (Image by Author)

Overview:

🫖 TLDR

This article presents an LLM-powered solution to improve the efficiency of ordering and payment processes in food establishments. It involves the integration of an LLM Agent (built with LangChain) which is capable of interacting with API endpoints (e.g. the POS System). This article walks through the codes to build such an agent with practical examples. The codes are published here. You can also interact with the agent here.

Demo Video: Notice the signposting/cues agentplease (read more below) in this conversation and how the agent picked only the relevant information to make respective API requests.

🥖 The Story

🧀 The Inspiration

I am a fan of the tasty Käsedings at Zeit für Brot (Berlin, Germany) and I’ve spent quite a bit of time hanging out at their busy Eberswalder spot. It’s where my interest in the tasty breads collides with the reality of waiting in line.

I noticed and hypothesized that, to maintain hygiene, cashiers don’t directly handle the food.

There would be one cashier and one other staff working in a pair to deal with customers. The cashiers only manage orders and payments, leaving food handling to their peers.

➡️ This observation sparked an idea: What if we could make the ordering and payment process more efficient, enhancing customer satisfaction, streamlining staff workflow, and ultimately boosting Zeit für Brot’s revenue (sorry for the buzzwords)?

Illustration generated by Author via ChatGPT

💡 The Solution: The Integration of an LLM-agent

This solution proposes integrating a custom LLM agent into the Point of Sale (POS) system. The agent will translate the staff’s orders into API requests, registering items in the POS system without manual entry.

💡 Workflow Simplified:

👩‍💼 Staff-Customer Interaction: A staff member receives an order from a customer. He communicates with the LLM agent through his headset. The staff member can engage in conversation with the customer without needing to deactivate the headset, allowing for a seamless interaction.

The LLM agent listens to the entire conversation, filtering out irrelevant chatter and focusing on the important details.

This is facilitated by the use of specific “agent” and “please” cues (signposting), which enables the LLM agent to distinguish between casual conversation and a direct request for assistance.

Example:

– Customer A: Do you want to get an espresso and two cinnamon rolls?
– Customer B: I don’t know.
– Customer A: OK, I’ll decide then. I’ll get a baguette, a cappuccino, and two cheesecakes.
– Staff: Agent, add a baguette, a cappuccino, and two cheesecakes, please.
– Customer A: Sorry I changed my mind, I won’t get the baguette.
– Staff: Agent, remove the baguette, please.
– Staff: Agent, proceed to payment by card, please.

You can check out the demo here (Image by Author)
Request made to the mentioned API endpoints (Image by Author)

Under the hood:

  • The agent processes this speech, generating API request(s) to register items to the order.
  • If there’s a change in the order, the staff can proceed with: “Agent, remove one cinnamon roll.” The agent will adjust the order accordingly.
  • Upon finalizing the order, “Agent, proceed to payment with QR codes / cards please” prompts the agent to initiate the payment process, creating a smooth end-to-end transaction.

🔌 The Codes

In this session, I will focus on discussing how to build a custom LLM agent with LangChain. I assume you are familiar to a certain extent with LangChain’s terms and usage. If not, you can checkout their documentation here.

I suggest you also refer to my repository when reading the following, as I will not display the entire scripts in this article. Instead, I will highlight the key takeaways for you.

What do you need to create a custom LLM agent that can interact with external systems?

  1. The tools (that interact with external systems)
  2. The prompt (that guides the agent)
  3. The agent (and its executor)

Now, let’s go through it one by one.

🔧 The tools

I created 3 simple tools to

  • add item(s) to an order
  • remove item(s) from an order
  • proceed to payment with a preferred method

These tools are simple functions created with the langchain @tool decorators.

You might wonder how the agent knows which tools to take. The answer is that it bases its decisions on the name and the description of the tools.

In the following codes:

  • The function’s name add_orders will be considered as the tool’s name. The function’s docstring is treated as the tool’s descriptions. The description of the tool plays an important role because it helps the LLM to reason and decide which tools are relevant in a given context.
  • This add_orders function take an argument order_data in the format of a dictionary (see codes below). The LLM will make sure that the order_data is in the format that we want. You can see that I wrote the docstring as if I am giving instructions to the LLM.
  • The function makes a POST request to an API endpoint. This, ideally, is the system that we want the agent to control. In my case, it is the POS system that the staff uses at the store.
@tool
def add_orders(order_data):
"""Add items to an order by making a POST request with the order_data argument strictly following this format:
order_data = [
{'quantity': 2, 'item': 'apple'},
{'quantity': 1, 'item': 'banana'},
{'quantity': 3, 'item': 'cherry'}
]
"""

# Make a POST request to the FastAPI endpoint
url = "http://localhost:8889/add"
# Adhere to the expected format of the FastAPI endpoint
request_json = {"list_items": order_data}
response = requests.post(url,
headers={
"Content-Type": "application/json"},
json=request_json)

As mentioned previously, besides this add_orders, I have remove_orders and pay_orders functions as well. They form a list of tools at the disposal of the agent. Here is how you do it:

# Create an LLM model
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# Create a list of tools for the LLM to pick
tools = [add_orders, remove_orders, pay_orders]
# Bind the tools to the LLM, the result is an LLM with tools
llm_with_tools = llm.bind_tools(tools)

💬 The prompt

I created a simple prompt with ChatPromptTemplate .

  • It starts with a system role, depicting the tasks that the agent needs to perform. This will govern the behavior of the agent. I think most of “the art” belongs to this part.
  • Then the user role comes in, with a place holder of {input} for us to pass the user’s “prompt” in. For example: “Agent please add a baguette to my order please.”
  • Finally, the MessagesPlaceholder is required so that the agent can store intermediate information about how it reasons and what steps have been taken. You do not need to worry about this one.
prompt = ChatPromptTemplate.from_messages(
[
("system",
"You are a helpful assistant, you extract user's food orders from a conversation and add or remove them to POS system by making POST request with the extracted order data.\\
User can also request to pay with a preferred payment method.\\
The real order information starts from a signpost word 'agent' and ends with a signpost word 'please'.\\
The rest of the conversation is not important, you must strictly ignore it. The extracted order should contain the items and their quantity,\\
and it is in the format of a python list of multiple dictionaries."
),
("user",
"{input}"),
MessagesPlaceholder(variable_name="agent_scratchpad"),
]
)

🤖 The agent

We form the agent by chaining all the langchain components together, including the before-mentioned prompt and the LLM model with its toolset.

You might wonder why we need to construct an input dictionary firsthand. It is because the agent will be called many times by its agent_executor after our invocation; and the agent_executor needs to pass on inputs that in the correct format to the agent. You see my point when you remove this step, langchain will tell you that it does not expect intermediate_steps in the input (see above where we constructed the prompt with agent_scratchpad not intermediate_steps ).

agent = (
{
"input": lambda x: x["input"],
"agent_scratchpad": lambda x: format_to_openai_tool_messages(x["intermediate_steps"]),
}
| prompt
| llm_with_tools
| OpenAIToolsAgentOutputParser()
)
# And we need to create an agent executor, we cannot just call the agent directly
agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True)
agent_output = agent_executor.invoke(
{"input": conversation}
)

Let’s discuss a little bit about the flow of the AgentExecutor . The AgentExecutor will carry out the following steps in sequence:

  • Format the input
  • Pass the output of the last step to the prompt
  • Pass the output of the last step to the LLM agent with its tools (a tool might be called here)
  • Parse the output of the last step and use it for further reasonings (e.g. maybe AgentExecutor will call another tool here)

Let’s try to invoke the AgentExecutor we just built. You can see (in the LangSmith log below) that after calling add_orders, the AgentExecutor can then call pay_orders as well.

agent_executor.invoke(
{"input": "Agent please add a baguette please.\
Agent please proceed to payment by card"
})
Under the hood of the AgentExecutor via LangSmith (Image by Author)

✍️ Conclusion

Congratulations on reaching the conclusion of this tutorial!

Up to now, we have created a custom LLM agent with Langchain from the 3 components: the tools, the prompt, and the agent.

This LLM agent can reason and interact with external systems (APIs).

Now, it’s up to your use cases that you can tailor these interactions by changing the tools and the prompt.

As I tried to keep the article succinct, I suggest you clone my repository and execute the complete script on your own.

➡️ You can find the repository here.

➡️ ️The live demo is available here.

If you have any questions, feel free to reach out to me on LinkedIn.

About me

I am Daniel Le, based in Berlin. I currently work as a Data Engineer — with a great passion for Machine Learning.

I am based in Berlin, Germany and I am interested in new technologies and how they can be implemented to solve real-world problems.

Should you have any inquiries or wish to discuss these interests further, please do not hesitate to connect with me on LinkedIn.

Reference

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓