How to make Generative AI reliable

Last Updated on September 18, 2024 by Editorial Team

Author(s): Mark O’Neill

Originally published on Towards AI.

They say your greatest strength creates your greatest weakness. Generative AI models that can produce an infinite amount of answers for an infinite amount of prompts, will naturally be terrible at behaving predictably and consistently when asked.

Conversely, if you tell a human you’re going to use LLMs in production, they will predictably and consistently tell you that “LLMs are too unreliable for that!”. If we can’t trust their output then we can’t use them for important functionality. And if we can’t use them properly, then they will never get close to their hype.

Fortunately, there are tools and techniques that let us use GenAI in a safe, scalable, and reliable way.

So before you dismount your AI hype cycle, let’s take one more pedal up the Slope of Enlightenment and learn how to use GenAI reliably.

Generative AI Systems

A GenAI system is a term I’ll use to describe something that goes beyond chat interfaces and content generation. They could also be called an application, API, interface, or whatever makes you happier.

These systems use a combination of generative models, alongside traditional systems to build, classify, decide, interact with, or manage complex objects or tasks.

Examples of use cases in the customer experience domains:

Generative UIs: Performing actions and rendering highly personalizing components in a user interface.
Automated Customer Service: Systems that don’t just process support tickets but investigate issues, make account changes, and provide assistance.
Hyper-Personalization (Marketing or Branding): Systems to curates content, styles, and messaging to perfectly suit each person.

There are a huge variety of GenAI systems — likely many we don’t even notice — in existence today.

I hope this guide helps you consider and build your own.

Why Generative AI outside of just generating?

Even if GenAI has reached its limits, these models already offer valuable skills outside of content generation that are yet to be fully utilized. The abilities of GenAI models to take advantage of are:

Understanding Human Language: They can interpret and infer natural language. Empowering us to handle many more varieties of requests and context.
Building Complex Artifacts: They can create code, components, designs, music, images, applications, personalities, etc.
Processing Complex Data: They can take large unstructured pieces of data and analyze and interpret them — all at a fraction of time compared to a human.
In-built World Knowledge: Trained with data beyond what a human can read in one’s lifetime, they have awareness of best practices and techniques from a huge range of areas beyond what we could ever hardcode into an API or UX.

Now it’s time to learn how to use these feisty little models as part of a cooperative and reliable system.

What is a Reliable system?

For a system to be reliable, we need to be able to depend on it to do what we ask, trust it to not do anything wrong, and feel safe that it won’t be compromised.

Breaking this down into three categories, we aim to:

Get Consistent Outputs: We need to define and validate the structure, format, and contents of it’s inputs and outputs.
Design for Scale and Accuracy: We need to build it in a way that breaks down work into simple achievable tasks, so that it can be accurate through growth and changes.
Ensure Safety and Security: We need to do everything possible to reduce risks of data leaks and intrusions from GenAI vulnerabilities.

AI Assistants vs. GenAI Systems

How is this different to the chat services we see today?

In the olden days, companies were tempted to just slap an ‘AI assistant’ into their application and claim it’s powered by AI. They’d give it a human-like name and tell you to ‘chat’ or ‘ask us anything’.

Job done, right?

The AI Assistant Approach

The below example is the standard chat-like AI assistant model. Ignoring the obvious fact that nobody has ever wanted to chat with a pdf, database, or system, this approach of chat conversations is not ideal for complex tasks. It has;

No clear structure to support complex tasks.
Prioritized chat instead of action.
Openned itself to vulnerabilities like prompt injection.

A monolithic chat-style system (source: author)

A GenAI System Approach — Controlled creativity

The second example is an AI engine for a super intelligent GenUI system. It powers the personalized, proactive, and semi-autonomous UI for a world-leading bank. It does a lot, but the functionality we care about today is displaying a personalized dashboard for each user.

A scalable and flexible GenAI System (source: author)

This approach brings a lot of benefits, such as:

It can creatively choose components and styles to best meet it’s goal but is limited in making invalid or out of scope outputs.
Structured process, inputs, and outputs for fine-grained control.
Potential for richer and more complex interfaces and interactions.
Easily extendable functionality.
Better testing, performance monitoring and measuring.
Improved security and data privacy.

Techniques for Getting Consistent Outputs

Getting consistency from LLMs is essential to success. We need a rigid structure for our inputs and outputs, and be able to handle when things don’t work.

History’s best art has always been created within constraints. Limiting a GenAI system doesn’t mean we hurt its creativity, it just means it’s focused on where we need it.

Structure & Validate Inputs and Outputs

Requests and responses between models should be made with confined structures and finite options. These schemas should be made as strict as possible to limit the scope and quality of what is returned.

Structured outputs make handling and validating easier (source: author)

Many LLM providers offer a setting to force JSON structured outputs which do not require additional wrangling or prompting instructions. A system should use a capability like this to ensure responses are valid before they are used or returned by system. Defining schemas for each request/response is tedious but necessary work for a reliable system.

A Bad Approach: We ask the system to generate the HTML/CSS/JS for the component from scratch in one big prompt and render it on the front end. No component ever looks the same and it causes issues with inaccessibility, inconsistency, and poor UX.
A Good Approach: The system is given a list of components, specifications, and style options to choose from, then decides the best combination to meet the goal. This still supports personalized designs, guarantees the component matches the data, and ensures the UI conforms to any brand and accessibility rules.

Useful Links:

Structured Output Providers: OpenAI, Anthropic, LlamaIndex.
Helpful Libraries: Vercel AI, Guardrails AI, LangChain (Python, JS).
Validation Tools: Zod.

Know How Accept and Handle Defeat

Each function should allow for a scenario where its task cannot be completed. Schemas should be designed in a way to allow for cases of invalid or insufficient detail, and error and default behaviours should be defined.

Eventually, even the best GenAI setup will encounter a situation where a misleading prompt cannot be handled. Prepare for this by giving the system the option to declare when it cannot complete the task, or when data is insufficient. Have rules in place to respond when reponses are invalid or unattainable.

Example: Our system is tasked with building a personalized dashboard for each user. In the case where the customer is new, we may have no useful data to personalize with. Instead of going off track and inventing its own display, or returning a blank view, it is encouraged to state when there is not enough data. The default dashboard can be shown instead.

Techniques for Designing For Scale

GenAI systems are just like any other software system. As such, we should apply best practice techniques and design principles to help them scale and perform like one.

Design to Extend, Not Modify

Functions within the system should be implemented in a way that allows them to be extended on without needing large modifications of the other parts.

New functionality can be added easily (source: author)

A GenAI system that produces complex outputs will require complex inputs. This means fetching data from various sources, interacting with external systems, and iterating over complex outputs. As new capabilities are added, the system should allow them to be added without needing to refactor and modify the other parts.

Example: In our system above, we can easily slot in new functionality, such as usability tweaks and device optimization tactics, that can be incorporated while creating personalised interfaces. This can be done with minimal changes to the rest of the process.

Follow the Single Responsibility Rule

Prompts and functions should each have one singular purpose or responsibility. This includes a defined input and output schema.

Doing this is a very effective way to maintain reliability across a larger task. Rather than asking a model to build an entire view or component, we can ask it to create a single part, in a controlled and defined way.

It’s tempting to create larger system prompts that represent multiple abilities and actions. You may be able to get away with this for a small system, but as more functionality, edge cases, and risks are added, it becomes unmanageable and prone to wild responses.

Useful Links:

Article: The SOLID Principles in Plain English
Wikipedia: Single Responsibility Principle

Prioritize Doing, Not Chatting

The system should prioritize performing any actions before ‘chatting’. If an action is not able to be completed (e.g. from insufficient or invalid data), it should aim to progress, pre-fill, or assist in completing the action.

These systems are built to do things for us. They should aim to do as much as possible with as little conversation and friction as possible. Ideally they should only resort to ‘chatting’ when needing more information or when asked to explain things. A perfect GenAI system would do everything we needed without us having to ask!

An AI Assistant is like that coworker who loves to tell everyone what to do and how to do it, but is very reluctant to do anything themselves.

(AI Assistant: ‘Ask us anything… except to do something useful for you’)

Techniques for Ensuring Safety and Security

It goes without saying that any software system needs to be safe and secure. GenAI models introduce even more risk for malicious techniques like Prompt Injection.

‘Air-Gap’ Important Functionality

All functionality that involves retrieving or updating important data should be kept separate from components that handle incoming requests (Air Gap). Tasks that involve this data should only be exposed and accessible to explicit and necessary components.

An 'air gap' prevents prompt injection attacks and data leaks (source: author)

Prompt injection, hallucinations, and weird behaviour are always a risk with GenAI. The system must be designed in a way to prevent any harmful requests being carried out by a function with access to sensitive data or restricted APIs. This technique dictates that any function that receives external messages should have no access to restricted systems or information.

Techniques for Prompt Engineering for Reliability

There are so many prompt engineering techniques available that claim to give better accuracy, decision making, formatting, and overall quality.

There’s not much more to say about prompt engineering techniques that hasn’t been written elsewhere. So instead I’ve compiled a list of useful techniques and links that you can use for any prompt-related project, not just GenAI systems. Enjoy!

Retrieval Augmented Generation (RAG): A technique to add your own data to a prompt before passing it to an LLM. (original paper)
Chain-of-Thought (CoT): The technique of asking an LLM to generate, or explain, a chain of thought — a series of intermediate reasoning steps — when responding (original paper)
PromptBreeder: Self-Referential Self-Improvement Via Prompt Evolution (original paper)
Self-Consistency Prompting: “It first samples a diverse set of reasoning paths instead of only taking the greedy one, and then selects the most consistent answer”. (original paper)
Encapsulate user messages inside larger messages or structures to improve context and limit prompt injection (similar to Prompt Function)

Conclusion

I hope this article has been helpful in convincing you to consider this approach for making Generative AI more reliable.

There are likely a lot more techniques and technologies that you could use to make great system. I think these are the most powerful and practical for any size implementation.

Please don’t hesitate to reach out as I’d love any feedback, questions, or suggestions for more techniques!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

How to make Generative AI reliable

Author(s): Mark O’Neill

Generative AI Systems

Why Generative AI outside of just generating?

What is a Reliable system?

AI Assistants vs. GenAI Systems

The AI Assistant Approach

A GenAI System Approach — Controlled creativity

Techniques for Getting Consistent Outputs

Structure & Validate Inputs and Outputs

Know How Accept and Handle Defeat

Techniques for Designing For Scale

Design to Extend, Not Modify

Follow the Single Responsibility Rule

Prioritize Doing, Not Chatting

Techniques for Ensuring Safety and Security

‘Air-Gap’ Important Functionality

Techniques for Prompt Engineering for Reliability

Conclusion

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

How to make Generative AI reliable

Author(s): Mark O’Neill

Generative AI Systems

Why Generative AI outside of just generating?

What is a Reliable system?

AI Assistants vs. GenAI Systems

The AI Assistant Approach

A GenAI System Approach — Controlled creativity

Techniques for Getting Consistent Outputs

Structure & Validate Inputs and Outputs

Know How Accept and Handle Defeat

Techniques for Designing For Scale

Design to Extend, Not Modify

Follow the Single Responsibility Rule

Prioritize Doing, Not Chatting

Techniques for Ensuring Safety and Security

‘Air-Gap’ Important Functionality

Techniques for Prompt Engineering for Reliability

Conclusion

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥