LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Last Updated on January 6, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Hi! In this part we’re moving from experiments and prototyping into the real world — production deployments.

Because the truth is: building a working notebook or a proof-of-concept is only the beginning. The real challenges start when your application must serve hundreds or thousands of users, run reliably 24/7, and still stay within budget.

Let’s start with the first foundation: a model-agnostic approach.

Model-agnostic from day one

Many teams building AI applications quickly lock themselves into a single provider — only OpenAI, or only Anthropic. That’s understandable: it’s faster to pick one API and focus. But long-term it’s a huge risk. If the provider raises prices, has an outage, or changes licensing terms — your entire application can stop.

That’s why it’s worth thinking from the very beginning about a model-agnostic gateway layer.

In practice, this means your code doesn’t talk directly to one specific model. Instead, it calls an abstraction:

“give me a chat-class LLM”, or
“give me an embedding generator”

And only the gateway decides whether under the hood it should call GPT-5, Claude 4.5 Sonnet, or a local LLaMA running on your own infrastructure.

API Gateway + routing + fallback

The second foundation is an API Gateway.

Imagine you expose a simple endpoint like POST /v1/chat, where users send requests. In a header like X-Model, the client specifies which model should be used.

The gateway can run multiple models in parallel — and it can also implement fallback logic: if the primary model doesn’t respond within a given time, you automatically switch to a backup model, for example an open-source one running locally.

This pattern doesn’t only improve reliability — it also opens the door to experimentation.

You can route 1% of traffic to a new model and see how it performs compared to the previous one, without changing the entire system.

Monitoring and cost control

The third foundation — often neglected — is monitoring and cost control.

In a prototype it’s enough to say “it works”. In production you’ll get harder questions:

How much does it cost per day?
What’s our hallucination rate?
How often do we reject outputs?

This is where tools like LangSmith help — but even a simple internal logging system can work.

We measure latency (because users don’t want to wait 30 seconds), we measure costs, and we measure quality — for example: how many answers were rejected by guardrails or evaluation.

And we can set very simple but effective alerts:

if daily cost exceeds $50 → send a notification,
if average response time goes above 5 seconds → trigger another alert.

With this, you have real visibility into what’s happening inside the system.

These three elements — model-agnostic gateway, API gateway, and monitoring — are not “nice-to-haves”. They’re foundations. If you treat them seriously, your application will not only run in production, but also stay resilient to changes in the market and technology.

Let’s jump now to the code.

Install libraries and load environment variables

!pip install -U langchain langchain-openai langgraph fastapi uvicorn

from dotenv import load_dotenv
load_dotenv()

Human in the Loop

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command

@tool
def risky_operation(secret: str) -> str:
 """Perform a risky operation that requires manual approval."""
 return f"Executed risky operation with: {secret}"

tools = [risky_operation]
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

hitl = HumanInTheLoopMiddleware(
 interrupt_on={
 "risky_operation": {"allowed_decisions": ["approve", "edit", "reject"]}
 },
 description_prefix="Manual approval required for risky operation:"
)


checkpointer = MemorySaver()
agent = create_agent(
 model=model,
 tools=tools,
 middleware=[hitl],
 checkpointer=checkpointer,
 debug=True
)

config = {"configurable": {"thread_id": "hitl-demo-1"}}

result = agent.invoke(
 {"messages": [{"role": "user", "content": "Please run the risky operation with secret code $%45654@."}]},
 config=config,
)

output:

[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73')]}
[updates] {'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}
[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}
[updates] {'__interrupt__': (Interrupt(value={'action_requests': [{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'description': "Manual approval required for risky operation:\n\nTool: risky_operation\nArgs: {'secret': '$%45654@'}"}], 'review_configs': [{'action_name': 'risky_operation', 'allowed_decisions': ['approve', 'edit', 'reject']}]}, id='a3abdfe342bd7c8be8b1b586ee9f8815'),)}

handle interrupt:

if "__interrupt__" in result:
 print("Interrupt detected!")
 decisions = [{"type": "approve"}]

 result = agent.invoke(
 Command(resume={"decisions": decisions}),
 config=config,
 )

output:

[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='Executed risky operation with: $%45654@', name='risky_operation', id='13109032-38fb-4d94-920c-90026acc41f3', tool_call_id='call_dK786IhVaO3Z4VssPOI1cM6y')]}

Model agnostic API gateway

To run example code below with model agnostic API gateway:
1. Place the above code in a file app.py


# Place the above code in a file app.py

from fastapi import FastAPI, Header
from pydantic import BaseModel
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

from langchain_openai import ChatOpenAI

class ChatRequest(BaseModel):
 message: str

class ChatResponse(BaseModel):
 provider: str
 model: str
 answer: str

prompt = ChatPromptTemplate.from_messages([
 ("system", "You are a helpful assistant."),
 ("human", "{message}")
])

def build_model(x_model: str):
 """
 x_model format:
 - 'openai:gpt-4o-mini'
 """
 if ":" in x_model:
 provider, model_name = x_model.split(":", 1)
 else:
 provider, model_name = "openai", x_model

 provider = provider.lower().strip()

 if provider == "openai":
 return provider, model_name, ChatOpenAI(model=model_name, temperature=0)

 # if provider == "anthropic": # support for another LLM API provider
 # return provider, model_name, ChatAnthropic(model=model_name, temperature=0)

 def _unknown(inputs: dict):
 return AIMessage(content=f"(unknown provider) Echo: {inputs.get('message','')}")
 return "unknown", x_model, RunnableLambda(_unknown)


app = FastAPI(title="Model-Agnostic LangChain Gateway")


@app.post("/chat", response_model=ChatResponse)
def chat_endpoint(
 req: ChatRequest,
 x_model: str = Header(default="openai:gpt-4o-mini", alias="X-Model"),
):
 provider, model_name, model = build_model(x_model)
 chain = prompt | model | StrOutputParser()
 answer: str = chain.invoke({"message": req.message})
 return ChatResponse(provider=provider, model=model_name, answer=answer)

2. Start server:

uvicorn app:app - reload

3. Send request:

curl -X POST 'http://127.0.0.1:8000/chat' \
 -H 'Content-Type: application/json' \
 -H 'X-Model: openai:gpt-5-mini' \
 -d '{"message":"Podaj 3 zalety Pythona."}'

curl -X POST 'http://127.0.0.1:8000/chat' \
 -H 'Content-Type: application/json' \
 -H 'X-Model: openai:gpt-4o-mini' \
 -d '{"message":"Podaj 3 zalety Pythona."}'

The future of GenAI

That brings us to the second part of this episode: the future of GenAI.

How will this industry look over the next few years? Nobody has a crystal ball — but some trends are already very clear.

Trend #1: Multimodality

Models like GPT-5 or Claude 4.5 can already analyze images, audio, and video. Soon this will be standard.

When you build applications, you have to assume users won’t send only text. They will upload screenshots, photos of documents, audio recordings. Your architecture needs to be ready for that.

Trend #2: Agentic workflows

Classic APIs and linear workflows are not enough when a process is complex and dynamic.

Instead of hardcoding conditions in traditional code, we’ll declare state graphs of agents: Researcher, Critic, Expert — and let the system iterate based on state and quality signals.

Keeping these trends in mind, we can prepare our applications for the next generation of even more capable AI models.

That’s all int this chapter dedicated to model-agnostic pattern, LLM API gateway and future AI trends.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Author(s): Michalzarnecki

Model-agnostic from day one

API Gateway + routing + fallback

Monitoring and cost control

Install libraries and load environment variables

Human in the Loop

Model agnostic API gateway

The future of GenAI

Trend #1: Multimodality

Trend #2: Agentic workflows

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Crack ML Interviews with Confidence: K-Nearest Neighbors (KNN 20 Q&A)

The Event-Driven Blueprint: How I Scaled a Spring Boot System to 10 Million Kafka Messages/Day

Building Vector Search? Why FAISS Alone Isn’t Enough

TAI #202: GPT-5.5 Moves Codex Into Real Work

Machine Learning System Design -The Model Serving Triangle, With One Forward Pass Flowing Through Every Trade-off (Part3)

AI Orchestration in Action: How MuleSoft and LLMs Fuel the Future of Enterprise AI

GPT-4 Has 1.8 Trillion Parameters. It Uses 2% of Them Per Token.

Part 20: Data Manipulation in Multi-Dimensional Aggregation

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Author(s): Michalzarnecki

Model-agnostic from day one

API Gateway + routing + fallback

Monitoring and cost control

Install libraries and load environment variables

Human in the Loop

Model agnostic API gateway

The future of GenAI

Trend #1: Multimodality

Trend #2: Agentic workflows

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement