LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway
Last Updated on January 6, 2026 by Editorial Team
Author(s): Michalzarnecki
Originally published on Towards AI.

Hi! In this part we’re moving from experiments and prototyping into the real world — production deployments.
Because the truth is: building a working notebook or a proof-of-concept is only the beginning. The real challenges start when your application must serve hundreds or thousands of users, run reliably 24/7, and still stay within budget.
Let’s start with the first foundation: a model-agnostic approach.
Model-agnostic from day one
Many teams building AI applications quickly lock themselves into a single provider — only OpenAI, or only Anthropic. That’s understandable: it’s faster to pick one API and focus. But long-term it’s a huge risk. If the provider raises prices, has an outage, or changes licensing terms — your entire application can stop.
That’s why it’s worth thinking from the very beginning about a model-agnostic gateway layer.
In practice, this means your code doesn’t talk directly to one specific model. Instead, it calls an abstraction:
- “give me a chat-class LLM”, or
- “give me an embedding generator”
And only the gateway decides whether under the hood it should call GPT-5, Claude 4.5 Sonnet, or a local LLaMA running on your own infrastructure.
API Gateway + routing + fallback
The second foundation is an API Gateway.
Imagine you expose a simple endpoint like POST /v1/chat, where users send requests. In a header like X-Model, the client specifies which model should be used.
The gateway can run multiple models in parallel — and it can also implement fallback logic: if the primary model doesn’t respond within a given time, you automatically switch to a backup model, for example an open-source one running locally.
This pattern doesn’t only improve reliability — it also opens the door to experimentation.
You can route 1% of traffic to a new model and see how it performs compared to the previous one, without changing the entire system.
Monitoring and cost control
The third foundation — often neglected — is monitoring and cost control.
In a prototype it’s enough to say “it works”. In production you’ll get harder questions:
- How much does it cost per day?
- What’s our hallucination rate?
- How often do we reject outputs?
This is where tools like LangSmith help — but even a simple internal logging system can work.
We measure latency (because users don’t want to wait 30 seconds), we measure costs, and we measure quality — for example: how many answers were rejected by guardrails or evaluation.
And we can set very simple but effective alerts:
- if daily cost exceeds $50 → send a notification,
- if average response time goes above 5 seconds → trigger another alert.
With this, you have real visibility into what’s happening inside the system.
These three elements — model-agnostic gateway, API gateway, and monitoring — are not “nice-to-haves”. They’re foundations. If you treat them seriously, your application will not only run in production, but also stay resilient to changes in the market and technology.
Let’s jump now to the code.
Install libraries and load environment variables
!pip install -U langchain langchain-openai langgraph fastapi uvicorn
from dotenv import load_dotenv
load_dotenv()
Human in the Loop
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command
@tool
def risky_operation(secret: str) -> str:
"""Perform a risky operation that requires manual approval."""
return f"Executed risky operation with: {secret}"
tools = [risky_operation]
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)
hitl = HumanInTheLoopMiddleware(
interrupt_on={
"risky_operation": {"allowed_decisions": ["approve", "edit", "reject"]}
},
description_prefix="Manual approval required for risky operation:"
)
checkpointer = MemorySaver()
agent = create_agent(
model=model,
tools=tools,
middleware=[hitl],
checkpointer=checkpointer,
debug=True
)
config = {"configurable": {"thread_id": "hitl-demo-1"}}
result = agent.invoke(
{"messages": [{"role": "user", "content": "Please run the risky operation with secret code $%45654@."}]},
config=config,
)
output:
[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73')]}
[updates] {'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}
[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}
[updates] {'__interrupt__': (Interrupt(value={'action_requests': [{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'description': "Manual approval required for risky operation:\n\nTool: risky_operation\nArgs: {'secret': '$%45654@'}"}], 'review_configs': [{'action_name': 'risky_operation', 'allowed_decisions': ['approve', 'edit', 'reject']}]}, id='a3abdfe342bd7c8be8b1b586ee9f8815'),)}
handle interrupt:
if "__interrupt__" in result:
print("Interrupt detected!")
decisions = [{"type": "approve"}]
result = agent.invoke(
Command(resume={"decisions": decisions}),
config=config,
)
output:
[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='Executed risky operation with: $%45654@', name='risky_operation', id='13109032-38fb-4d94-920c-90026acc41f3', tool_call_id='call_dK786IhVaO3Z4VssPOI1cM6y')]}
Model agnostic API gateway
To run example code below with model agnostic API gateway:
1. Place the above code in a file app.py
# Place the above code in a file app.py
from fastapi import FastAPI, Header
from pydantic import BaseModel
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
class ChatRequest(BaseModel):
message: str
class ChatResponse(BaseModel):
provider: str
model: str
answer: str
prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{message}")
])
def build_model(x_model: str):
"""
x_model format:
- 'openai:gpt-4o-mini'
"""
if ":" in x_model:
provider, model_name = x_model.split(":", 1)
else:
provider, model_name = "openai", x_model
provider = provider.lower().strip()
if provider == "openai":
return provider, model_name, ChatOpenAI(model=model_name, temperature=0)
# if provider == "anthropic": # support for another LLM API provider
# return provider, model_name, ChatAnthropic(model=model_name, temperature=0)
def _unknown(inputs: dict):
return AIMessage(content=f"(unknown provider) Echo: {inputs.get('message','')}")
return "unknown", x_model, RunnableLambda(_unknown)
app = FastAPI(title="Model-Agnostic LangChain Gateway")
@app.post("/chat", response_model=ChatResponse)
def chat_endpoint(
req: ChatRequest,
x_model: str = Header(default="openai:gpt-4o-mini", alias="X-Model"),
):
provider, model_name, model = build_model(x_model)
chain = prompt | model | StrOutputParser()
answer: str = chain.invoke({"message": req.message})
return ChatResponse(provider=provider, model=model_name, answer=answer)
2. Start server:
uvicorn app:app - reload
3. Send request:
curl -X POST 'http://127.0.0.1:8000/chat' \
-H 'Content-Type: application/json' \
-H 'X-Model: openai:gpt-5-mini' \
-d '{"message":"Podaj 3 zalety Pythona."}'
curl -X POST 'http://127.0.0.1:8000/chat' \
-H 'Content-Type: application/json' \
-H 'X-Model: openai:gpt-4o-mini' \
-d '{"message":"Podaj 3 zalety Pythona."}'
The future of GenAI
That brings us to the second part of this episode: the future of GenAI.
How will this industry look over the next few years? Nobody has a crystal ball — but some trends are already very clear.
Trend #1: Multimodality
Models like GPT-5 or Claude 4.5 can already analyze images, audio, and video. Soon this will be standard.
When you build applications, you have to assume users won’t send only text. They will upload screenshots, photos of documents, audio recordings. Your architecture needs to be ready for that.
Trend #2: Agentic workflows
Classic APIs and linear workflows are not enough when a process is complex and dynamic.
Instead of hardcoding conditions in traditional code, we’ll declare state graphs of agents: Researcher, Critic, Expert — and let the system iterate based on state and quality signals.
Keeping these trends in mind, we can prepare our applications for the next generation of even more capable AI models.
That’s all int this chapter dedicated to model-agnostic pattern, LLM API gateway and future AI trends.
see next chapter
see previous chapter
see the full code from this article in the GitHub repository
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.