Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take the GenAI Test: 25 Questions, 6 Topics. Free from Activeloop & Towards AI

Publication

How to make Generative AI reliable
Latest   Machine Learning

How to make Generative AI reliable

Last Updated on September 18, 2024 by Editorial Team

Author(s): Mark O’Neill

Originally published on Towards AI.

They say your greatest strength creates your greatest weakness. Generative AI models that can produce an infinite amount of answers for an infinite amount of prompts, will naturally be terrible at behaving predictably and consistently when asked.

Conversely, if you tell a human you’re going to use LLMs in production, they will predictably and consistently tell you that β€œLLMs are too unreliable for that!”. If we can’t trust their output then we can’t use them for important functionality. And if we can’t use them properly, then they will never get close to their hype.

Fortunately, there are tools and techniques that let us use GenAI in a safe, scalable, and reliable way.

So before you dismount your AI hype cycle, let’s take one more pedal up the Slope of Enlightenment and learn how to use GenAI reliably.

Generative AI Systems

A GenAI system is a term I’ll use to describe something that goes beyond chat interfaces and content generation. They could also be called an application, API, interface, or whatever makes you happier.

These systems use a combination of generative models, alongside traditional systems to build, classify, decide, interact with, or manage complex objects or tasks.

Examples of use cases in the customer experienceΒ domains:

  • Generative UIs: Performing actions and rendering highly personalizing components in a user interface.
  • Automated Customer Service: Systems that don’t just process support tickets but investigate issues, make account changes, and provide assistance.
  • Hyper-Personalization (Marketing or Branding): Systems to curates content, styles, and messaging to perfectly suit each person.

There are a huge variety of GenAI systemsβ€Šβ€” likely many we don’t even noticeβ€Šβ€” in existence today.

I hope this guide helps you consider and build your own.

Why Generative AI outside ofΒ justΒ generating?

Even if GenAI has reached its limits, these models already offer valuable skills outside of content generation that are yet to be fully utilized. The abilities of GenAI models to take advantage of are:

  • Understanding Human Language: They can interpret and infer natural language. Empowering us to handle many more varieties of requests and context.
  • Building Complex Artifacts: They can create code, components,Β designs, music, images, applications,Β personalities, etc.
  • Processing Complex Data: They can take large unstructured pieces of data and analyze and interpret themβ€Šβ€”β€Šall at a fraction of time compared to a human.
  • In-built World Knowledge: Trained with data beyond what a human can read in one’s lifetime, they have awareness of best practices and techniques from a huge range of areas beyond what we could ever hardcode into an APIΒ orΒ UX.

Now it’s time to learn how to use these feisty little models as part of a cooperative and reliable system.

What is a Reliable system?

For a system to be reliable, we need to be able to depend on it toΒ doΒ whatΒ weΒ ask, trust it to not do anything wrong, and feel safe that it won’t be compromised.

BreakingΒ thisΒ down into three categories, we aim to:

  1. Get Consistent Outputs: We need to define and validate the structure, format, and contents of it’s inputs and outputs.
  2. Design forΒ ScaleΒ andΒ Accuracy: We need to build it in a way that breaks down work into simple achievable tasks, so that it can be accurate through growth and changes.
  3. Ensure Safety and Security: We need to do everything possible to reduce risks of data leaks and intrusions from GenAI vulnerabilities.

AI Assistants vs. GenAI Systems

HowΒ isΒ thisΒ different to the chatΒ servicesΒ weΒ seeΒ today?

In the olden days, companies were tempted to just slap an β€˜AI assistant’ into their application and claim it’s powered by AI. They’d give it a human-like name and tell you to β€˜chat’ or β€˜ask us anything’.

Job done, right?

The AI Assistant Approach

The below example is the standard chat-like AI assistant model. Ignoring the obvious fact that nobody has ever wanted to chat with a pdf, database, or system, this approach of chat conversations is not ideal for complex tasks. It has;

  • No clear structure to support complex tasks.
  • Prioritized chat instead ofΒ action.
  • Openned itself to vulnerabilities like prompt injection.
A monolithic chat-style system (source: author)

A GenAI System Approachβ€Šβ€” Controlled creativity

The second example is an AI engine for a super intelligent GenUI system. It powers the personalized, proactive, and semi-autonomous UI for a world-leading bank. It does a lot, but the functionality we care about today is displaying a personalized dashboard for each user.

A scalable and flexible GenAI System (source: author)

This approach brings a lot of benefits,Β suchΒ as:

  • It can creatively choose components and styles to best meet it’s goal but is limited in making invalid or out of scope outputs.
  • Structured process, inputs, and outputs for fine-grained control.
  • Potential for richer and more complex interfaces and interactions.
  • Easily extendable functionality.
  • Better testing, performance monitoring and measuring.
  • Improved security and data privacy.

Techniques for Getting Consistent Outputs

Getting consistency from LLMs is essential to success. We need a rigid structure for our inputs and outputs, and be able to handle when things don’t work.

History’s best art has always been created within constraints. Limiting a GenAI system doesn’t mean we hurt its creativity, it just means it’s focused on where we need it.

Structure & Validate Inputs and Outputs

Requests and responses between models should be made with confined structures and finite options. These schemas should be made as strict as possible to limit the scope and quality of what is returned.

Structured outputs make handling and validating easier (source: author)

Many LLM providers offer a setting to force JSON structured outputs which do not require additional wrangling or prompting instructions. A system should use a capability like this to ensure responses are valid before they are used or returned by system. Defining schemas for each request/response is tedious but necessary work for a reliable system.

  • A Bad Approach: We ask the system to generate the HTML/CSS/JS for the component from scratch in one big prompt and render it on the front end. No component ever looks the same and it causes issues with inaccessibility, inconsistency, and poor UX.
  • A Good Approach: The system is given a list of components, specifications, and style options to choose from, then decides the best combination to meet the goal. This still supports personalized designs, guarantees the component matches the data, and ensures the UI conforms to any brand and accessibility rules.

Useful Links:

Know How Accept and Handle Defeat

Each function should allow for a scenario where its task cannot be completed. Schemas should be designed in a way to allow for cases of invalid or insufficient detail, and error and default behaviours should be defined.

Eventually, even the best GenAI setup will encounter a situation where a misleading prompt cannot be handled. Prepare for this by giving the system the option to declare when it cannot complete the task, or when data is insufficient. Have rules in place to respond when reponses are invalid or unattainable.

Example: Our system is tasked with building a personalized dashboard for each user. In the case where the customer is new, we may have no useful data to personalize with. Instead of going off track and inventing its own display, or returning a blank view, it is encouraged to state when there is not enough data. The default dashboard can be shown instead.

Techniques for Designing For Scale

GenAI systems are just like any other software system. As such, we should apply best practice techniques and design principles to help them scale and perform like one.

Design to Extend, Not Modify

Functions within the system should be implemented in a way that allows them to be extended on without needing large modificationsΒ ofΒ theΒ otherΒ parts.

New functionality can be added easily (source: author)

A GenAI system that produces complex outputs will require complex inputs. This means fetching data from various sources, interacting with external systems, and iterating over complex outputs. As new capabilities are added, the system should allow them to be added without needing to refactor and modify the other parts.

Example: In our system above, we can easily slot in new functionality, such as usability tweaks and device optimization tactics, that can be incorporated while creating personalised interfaces. This can be done with minimal changes to the rest of the process.

Follow the Single Responsibility Rule

Prompts and functions should each have one singular purpose or responsibility. This includes a defined input and output schema.

Doing this is a very effective way to maintain reliability across a larger task. Rather than asking a model to build an entire view or component, we can ask it to create a single part, in a controlled and defined way.

It’s tempting to create larger system prompts that represent multiple abilities and actions. You may be able to get away with this for a small system, but as more functionality, edge cases, and risks are added, it becomes unmanageable and proneΒ toΒ wildΒ responses.

Useful Links:

Prioritize Doing, Not Chatting

The system should prioritize performing any actions before β€˜chatting’. If an action is not able to be completed (e.g. from insufficient or invalid data), it should aim to progress, pre-fill, or assist in completing the action.

These systems are built to do things for us. They should aim to do as much as possible with as little conversation and friction as possible. Ideally they should only resort to β€˜chatting’ when needing more information or when asked to explain things. A perfect GenAI system would do everything we needed without us having to ask!

An AI Assistant is like that coworker who loves to tell everyone what to do and how to do it, but is very reluctant to do anything themselves.

(AI Assistant: β€˜Ask us anything… except to do something useful for you’)

Techniques for Ensuring Safety and Security

It goes without saying that any software system needs to be safe and secure. GenAI models introduce even more risk for malicious techniques like Prompt Injection.

β€˜Air-Gap’ Important Functionality

All functionality that involves retrieving or updating important data should be kept separate from components that handle incoming requests (Air Gap). Tasks that involve this data should only be exposed and accessible to explicit and necessary components.

An 'air gap' prevents prompt injection attacks and data leaks (source: author)

Prompt injection, hallucinations, and weird behaviour are always a risk with GenAI. The system must be designed in a way to prevent any harmful requests being carried out by a function with access to sensitive data or restricted APIs. This technique dictates that any function that receives external messages should have no access to restricted systems or information.

Techniques for Prompt Engineering for Reliability

There are so many prompt engineering techniques available that claim to give better accuracy, decision making, formatting, and overall quality.

There’s not much more to say about prompt engineering techniques that hasn’t been written elsewhere. So instead I’ve compiled a list of useful techniques and links that you can use for any prompt-related project, not just GenAI systems. Enjoy!

Conclusion

I hope this article has been helpful in convincing you to consider this approach for making Generative AIΒ more reliable.

There are likely a lot more techniques and technologies that you could use to make great system. I think these are the most powerful and practical for any size implementation.

Please don’t hesitate to reach out as I’d love any feedback, questions,Β orΒ suggestions forΒ moreΒ techniques!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓