LLM & AI Agent Applications with LangChain and LangGraph — Part 19: Guardrails (Safety Barriers for LLMs)
Last Updated on January 3, 2026 by Editorial Team
Author(s): Michalzarnecki
Originally published on Towards AI.

Hi! In this chapter we’ll move to another topic that is just as practical — and in many real applications, absolutely critical: Guardrails, a safety-barrier system for language models.
Guardrails are simply a set of rules and validators that check whether the answer generated by the model matches our requirements. Thanks to them we can immediately catch errors, weird formats, or responses that are simply unusable inside an application.
In practice, guardrails are our first line of defense against the unpredictability of LLMs.
Why do we need guardrails?
Language models are powerful, but they have one fundamental property: they are stochastic — meaning their outputs have randomness.
So even if you ask for a specific format, the model might still:
- add an unnecessary comment,
- break the structure,
- or return something completely unexpected.
For example:
- You ask for pure JSON, and the model adds a sentence before it:
“Here is the answer in JSON: …” - You want exactly three tags, and the model returns three tags plus a sentence:
“Here are the generated tags: …” - You expect Python code, and you get a mix of Python and Markdown commentary.
In situations like this, guardrails are priceless. They validate the output automatically and can:
- reject an invalid result,
- raise an error,
- or force the model to regenerate the answer.
Types of guardrails in LangSmith
Here are a few guardrails you’ll typically see in LangSmith:
1) JSON Format Validator
Checks whether the output is valid JSON.
This is the most commonly used one, because JSON is the default data exchange format in most applications.
from langchain_classic.evaluation import JsonValidityEvaluator
evaluator = JsonValidityEvaluator()
# print(evaluator.evaluate_strings(prediction='{"x": 1}')) # correct
print(evaluator.evaluate_strings(prediction='{x: 1}')) # incorrect
output:
{'score': 0, 'reasoning': 'Expecting property name enclosed in double quotes: line 1 column 2 (char 1)'}
2) JSON Equality Validator
Checks the equality of JSONs after parsing (the order of keys in JSON does not matter)
from langchain_classic.evaluation import JsonEqualityEvaluator
evaluator = JsonEqualityEvaluator()
print(evaluator.evaluate_strings(
prediction='{"a":1,"b":[2,3]}',
reference='{"b":[2,3],"a":2}',
))
{'score': False}
3) Fallback Messages Validator
This validator detects responses like:
“I’m sorry, but I can’t help with that.”
Such answers can appear when the model decides the topic is inappropriate, or when it simply doesn’t understand the prompt. In many applications — especially business chatbots — you can’t allow this behavior silently, so you need to catch it.
from langchain_openai import ChatOpenAI
from dotenv import load_dotenv
load_dotenv()
primary = ChatOpenAI(model="gpt-4o-miniS", max_retries=0)
backup = ChatOpenAI(model="gpt-3.5-turbo")
chain = primary.with_fallbacks([backup])
print(chain.invoke("Describe Python in 1 sentence."))
output:
content='Python is a versatile and user-friendly programming language known for its simplicity and readability.' additional_kwargs={
'refusal': None
} response_metadata={
'token_usage': {
'completion_tokens': 16,
'prompt_tokens': 14,
'total_tokens': 30,
'completion_tokens_details': {
'accepted_prediction_tokens': 0,
'audio_tokens': 0,
'reasoning_tokens': 0,
'rejected_prediction_tokens': 0
},
'prompt_tokens_details': {
'audio_tokens': 0,
'cached_tokens': 0
}
},
'model_provider': 'openai',
'model_name': 'gpt-3.5-turbo-0125',
'system_fingerprint': None,
'id': 'chatcmpl-CZxXq6wdZu8Rwff4AfMJDNDh7ABEn',
'service_tier': 'default',
'finish_reason': 'stop',
'logprobs': None
} id='lc_run--7c27ffd5-179c-43a8-ba8d-b5e8407f2529-0' usage_metadata={
'input_tokens': 14,
'output_tokens': 16,
'total_tokens': 30,
'input_token_details': {
'audio': 0,
'cache_read': 0
},
'output_token_details': {
'audio': 0,
'reasoning': 0
}
}
4) Regex Pattern Validator
With regular expressions you can check whether the output matches a specific expected pattern.
This is extremely powerful when you have strict requirements — for example for phone numbers, email addresses, postal codes, IDs, invoice numbers, and so on.
from langchain_classic.evaluation import RegexMatchStringEvaluator
evaluator = RegexMatchStringEvaluator()
result = evaluator.evaluate_strings(
prediction="Order ID: ABC-1234",
reference=r"^Order ID: [A-Z]{3}-\d{4}$",
)
print(result['score'])
iter = 3
while result['score'] < 1.0 and iter > 0:
iter -= 1
print('run model once more')
output:
1
5) Token Limit Validator
A guardrail that ensures the output doesn’t exceed a defined number of tokens.
This matters because overly long answers can increase costs and sometimes even break application logic.
#%% md
### Token Limit
Tracking and pruning history to a token limit to avoid exceeding model context.
#%%
import json
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_core.messages.utils import trim_messages, count_tokens_approximately
from langchain_openai import ChatOpenAI
messages = [
SystemMessage(content="You are a helpful assistant."),
HumanMessage(content="(long conversation history here / many messages...)"),
]
trimmed = trim_messages(
messages,
strategy="last",
token_counter=count_tokens_approximately,
max_tokens=256,
start_on="human",
include_system=True,
)
llm = ChatOpenAI(model="gpt-4o-mini")
print(json.dumps(llm.invoke(trimmed).response_metadata, indent=4))
output:
{
"token_usage": {
"completion_tokens": 51,
"prompt_tokens": 25,
"total_tokens": 76,
"completion_tokens_details": {
"accepted_prediction_tokens": 0,
"audio_tokens": 0,
"reasoning_tokens": 0,
"rejected_prediction_tokens": 0
},
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0
}
},
"model_provider": "openai",
"model_name": "gpt-4o-mini-2024-07-18",
"system_fingerprint": "fp_560af6e559",
"id": "chatcmpl-CZxXrZEP6NDARAHV3gEdZqJBFF6PQ",
"service_tier": "default",
"finish_reason": "stop",
"logprobs": null
}
6) Word Limit Validator
Works similarly, but counts words instead of tokens.
Useful for tasks like generating summaries of a fixed length.
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
limit = 25
prompt = f"Write a summary in MAX {limit} words: What is machine learning?"
resp = llm.invoke(prompt).content
if len(resp.split()) > limit:
# quick fix - ask the model to shorten to the limit
resp = llm.invoke(f"Shorten this to max {limit} words, without any additions:\n\n{resp}").content
print(resp)
output:
Machine learning is a subset of artificial intelligence that enables systems to learn and improve from data without explicit programming.
How guardrails work in practice
Let’s imagine you’re building a system that generates financial reports.
- You enable JSON Format Validator to ensure the output can be parsed by your application.
- You add a Token Limit Validator so the report isn’t longer than, say, 1000 tokens.
- You include a Regex Pattern Validator to verify that numeric values are returned as numbers, not written out as words.
With this setup, you gain confidence that every response will be not only correct in content, but also usable and safe to process.
And when you combine guardrails with evaluators, you get a complete quality control system — so LLM-based applications are not only intelligent, but also stable and predictable.
That’s all in this part dedicated to guardrails. In the next article of this series we will implement code that uses Retrieval Augmented Generation RAG to generate answers based on source documents.
see next chapter
see previous chapter
see the full code from this article in the GitHub repository
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.