Creating a Smart Home AI Assistant
Last Updated on May 12, 2024 by Editorial Team
Author(s): Michael K
Originally published on Towards AI.
The hardware AI assistants recently released have been making splashes in the news which gave me a lot of inspiration around the concept of an βaction modelβ and how powerful they could be. It also made me curious about how hard would it be to give a large language model access to my smart home API β because coding an entire assistant is totally easier than just opening a tab to my dashboard.
In this article, using Python and a few open-source tools, weβll create an assistant that can perform almost any action we desire. Weβll also explore how this works under the hood, and how we can use some extra tools to make debugging these agents a cakewalk.
Wrestling LLM Responses
Iβve previously written an article about prompt engineering, which still proves to be our most powerful concept as the end-users of the models. Tool use is a supercharged version of prompt engineering, which allows us to give the models a way to do more than just generate text in the end.
For example, we could give the model the ability to search Wikipedia, look up customer information for a support request, or send an email β the sky is truly the limit, other than your programming ability of course. Combined with tool use, we can also get the LLMs to generate structured output, allowing us to reliably provide a formatted response.
Without these tools, the modelβs response can vary wildly, or be heavily influenced by the context provided. This often distracts the model from the requested format, or, depending on the context, can produce erroneous results. The random seed the model uses, as well as its temperature (willingness to generate more varied responses), can be controlled; however, this is far from perfect.
Creating the Solution
To manage the dependencies for the project, Iβll be using Poetry, which we can initialize like so:
Poetry will create all of the boilerplate we need to get started, so the next step is to define any additional dependencies we have. Letβs go ahead and add those now:
Ollama
Iβll be using Ollama to handle communicating with the model, however, Phidata supports numerous LLM integrations, so you could swap out Ollama for whichever works best for you. To get Ollama set up, it only takes a few steps:
Other than Metaβs Llama 3, Iβve had great success with Mistralβs 7B model and Microsoftβs Wizard LM2 when using tools. As more modern models are released, tool use will likely become better supported.
Creating the Assistant
Phidata lets us structure and format the LLMβs response using Pydantic objects, giving us a reliable method to extract information from the response in a programmatic fashion. For example, if we wanted to create an assistant that only answered math questions:
This is incredibly useful for instances where you have complex responses from the model. If you take a look at the prompt it generated, we can see how it gets the model to play nice:
Through prompt engineering, it is massaging their response into exactly what we would need, with or without the fields we would require. For example, if we asked a question without an apparent answer:
Based on my previous experience with Phidata in a few projects, itβs vital to give the model every possible option as it can trigger an error. In the math example above, if you did not tell Pydantic that the answer
key could be None
as well, it will provide a verbose answer in addition to the context, versus just returning None
:
Assistant Tool Use
Much like ourselves, giving the LLM tools to perform actions makes it more efficient, accurate, and useful in the long run. Phidata comes with a bunch of awesome tools built-in, but we can also create our own tools giving it access to databases, APIs, or even local binaries if we desire.
Letβs give the Assistant access to the internal API for my house, so it can tell us the temperature in a few locations around the house:
Phidata does all of the heavy lifting for us, by parsing the response from the model, calling the correct function, and finally returning the response. Iβve included a mock feature so you can test it out without having an API of your own.
API Creation
To interact with our assistant, weβll use FastAPI to create a light REST API to handle incoming requests and run the assistant code for us. Another option would be to use a queue system, however, for our use case, this should work fine since it is low traffic.
First, let's install the dependencies weβll need for the API:
Then, we can define our base application:
Iβm setting up Logfire here, which is optional, but it increases our visibility greatly, and we donβt have to spelunk through a mountain of logs as well! Most of the libraries used in this project already have integrations with Logfire, allowing us to truly extract as much information as possible in the fewest lines of code.
Testing
To run the server, we can use the fastapi
utility that gets linked after we install the library:
By default, FastAPI uses port 8000 so weβll use that to send a test prompt:
Logfire
If you enabled Logfire, we can follow the chain of actions and see the arguments and values for each step:
The timing chart to the right is also great for understanding where a request might be getting stuck so further investigation can be done. Also, since I plan to eventually try this with a physical device, being able to go back and investigate a weird response is a lifesaver.
Next Steps
The only part missing now is the actual hardware β so my next project is to take an extra ESP32 I have lying around, and see how much work itβll be to do speech-to-text conversion as well as give our helpful assistant a voice.
If you would like the code in the finished format, check out the code repository linked below for the full example.
Resources
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI