Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway
Latest   Machine Learning

LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Last Updated on January 6, 2026 by Editorial Team

Author(s): Michalzarnecki

Originally published on Towards AI.

LLM & AI Agent Applications with LangChain and LangGraph — Part 29: Model Agnostic Pattern and LLM API Gateway

Hi! In this part we’re moving from experiments and prototyping into the real world — production deployments.

Because the truth is: building a working notebook or a proof-of-concept is only the beginning. The real challenges start when your application must serve hundreds or thousands of users, run reliably 24/7, and still stay within budget.

Let’s start with the first foundation: a model-agnostic approach.

Model-agnostic from day one

Many teams building AI applications quickly lock themselves into a single provider — only OpenAI, or only Anthropic. That’s understandable: it’s faster to pick one API and focus. But long-term it’s a huge risk. If the provider raises prices, has an outage, or changes licensing terms — your entire application can stop.

That’s why it’s worth thinking from the very beginning about a model-agnostic gateway layer.

In practice, this means your code doesn’t talk directly to one specific model. Instead, it calls an abstraction:

  • “give me a chat-class LLM”, or
  • “give me an embedding generator”

And only the gateway decides whether under the hood it should call GPT-5, Claude 4.5 Sonnet, or a local LLaMA running on your own infrastructure.

API Gateway + routing + fallback

The second foundation is an API Gateway.

Imagine you expose a simple endpoint like POST /v1/chat, where users send requests. In a header like X-Model, the client specifies which model should be used.

The gateway can run multiple models in parallel — and it can also implement fallback logic: if the primary model doesn’t respond within a given time, you automatically switch to a backup model, for example an open-source one running locally.

This pattern doesn’t only improve reliability — it also opens the door to experimentation.

You can route 1% of traffic to a new model and see how it performs compared to the previous one, without changing the entire system.

Monitoring and cost control

The third foundation — often neglected — is monitoring and cost control.

In a prototype it’s enough to say “it works”. In production you’ll get harder questions:

  • How much does it cost per day?
  • What’s our hallucination rate?
  • How often do we reject outputs?

This is where tools like LangSmith help — but even a simple internal logging system can work.

We measure latency (because users don’t want to wait 30 seconds), we measure costs, and we measure quality — for example: how many answers were rejected by guardrails or evaluation.

And we can set very simple but effective alerts:

  • if daily cost exceeds $50 → send a notification,
  • if average response time goes above 5 seconds → trigger another alert.

With this, you have real visibility into what’s happening inside the system.

These three elements — model-agnostic gateway, API gateway, and monitoring — are not “nice-to-haves”. They’re foundations. If you treat them seriously, your application will not only run in production, but also stay resilient to changes in the market and technology.

Let’s jump now to the code.

Install libraries and load environment variables

!pip install -U langchain langchain-openai langgraph fastapi uvicorn
from dotenv import load_dotenv
load_dotenv()

Human in the Loop

from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain.agents import create_agent
from langchain.agents.middleware import HumanInTheLoopMiddleware
from langgraph.checkpoint.memory import MemorySaver
from langgraph.types import Command

@tool
def risky_operation(secret: str) -> str:
"""Perform a risky operation that requires manual approval."""
return f"Executed risky operation with: {secret}"

tools = [risky_operation]
model = ChatOpenAI(model="gpt-4o-mini", temperature=0)

hitl = HumanInTheLoopMiddleware(
interrupt_on={
"risky_operation": {"allowed_decisions": ["approve", "edit", "reject"]}
},
description_prefix="Manual approval required for risky operation:"
)


checkpointer = MemorySaver()
agent = create_agent(
model=model,
tools=tools,
middleware=[hitl],
checkpointer=checkpointer,
debug=True
)

config = {"configurable": {"thread_id": "hitl-demo-1"}}

result = agent.invoke(
{"messages": [{"role": "user", "content": "Please run the risky operation with secret code $%45654@."}]},
config=config,
)

output:

[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73')]}
[updates] {'model': {'messages': [AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}
[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}
[updates] {'__interrupt__': (Interrupt(value={'action_requests': [{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'description': "Manual approval required for risky operation:\n\nTool: risky_operation\nArgs: {'secret': '$%45654@'}"}], 'review_configs': [{'action_name': 'risky_operation', 'allowed_decisions': ['approve', 'edit', 'reject']}]}, id='a3abdfe342bd7c8be8b1b586ee9f8815'),)}

handle interrupt:

if "__interrupt__" in result:
print("Interrupt detected!")
decisions = [{"type": "approve"}]

result = agent.invoke(
Command(resume={"decisions": decisions}),
config=config,
)

output:

[values] {'messages': [HumanMessage(content='Please run the risky operation with secret code $%45654@.', additional_kwargs={}, response_metadata={}, id='589244c7-9860-48fa-b68a-eca595510a73'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 60, 'total_tokens': 79, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_560af6e559', 'id': 'chatcmpl-CaJj7md4CRaAN2mcI1ju8uek8BJti', 'service_tier': 'default', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--35ad04bd-5d01-4649-a64c-d8c583ffe3aa-0', tool_calls=[{'name': 'risky_operation', 'args': {'secret': '$%45654@'}, 'id': 'call_dK786IhVaO3Z4VssPOI1cM6y', 'type': 'tool_call'}], usage_metadata={'input_tokens': 60, 'output_tokens': 19, 'total_tokens': 79, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='Executed risky operation with: $%45654@', name='risky_operation', id='13109032-38fb-4d94-920c-90026acc41f3', tool_call_id='call_dK786IhVaO3Z4VssPOI1cM6y')]}

Model agnostic API gateway

To run example code below with model agnostic API gateway:
1. Place the above code in a file app.py


# Place the above code in a file app.py

from fastapi import FastAPI, Header
from pydantic import BaseModel
from langchain_core.runnables import RunnableLambda
from langchain_core.messages import AIMessage
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

from langchain_openai import ChatOpenAI

class ChatRequest(BaseModel):
message: str

class ChatResponse(BaseModel):
provider: str
model: str
answer: str

prompt = ChatPromptTemplate.from_messages([
("system", "You are a helpful assistant."),
("human", "{message}")
])

def build_model(x_model: str):
"""
x_model format:
- 'openai:gpt-4o-mini'
"""

if ":" in x_model:
provider, model_name = x_model.split(":", 1)
else:
provider, model_name = "openai", x_model

provider = provider.lower().strip()

if provider == "openai":
return provider, model_name, ChatOpenAI(model=model_name, temperature=0)

# if provider == "anthropic": # support for another LLM API provider
# return provider, model_name, ChatAnthropic(model=model_name, temperature=0)

def _unknown(inputs: dict):
return AIMessage(content=f"(unknown provider) Echo: {inputs.get('message','')}")
return "unknown", x_model, RunnableLambda(_unknown)


app = FastAPI(title="Model-Agnostic LangChain Gateway")


@app.post("/chat", response_model=ChatResponse)
def chat_endpoint(
req: ChatRequest,
x_model: str = Header(default="openai:gpt-4o-mini", alias="X-Model"),
):
provider, model_name, model = build_model(x_model)
chain = prompt | model | StrOutputParser()
answer: str = chain.invoke({"message": req.message})
return ChatResponse(provider=provider, model=model_name, answer=answer)

2. Start server:

uvicorn app:app - reload

3. Send request:

curl -X POST 'http://127.0.0.1:8000/chat' \
-H 'Content-Type: application/json' \
-H 'X-Model: openai:gpt-5-mini' \
-d '{"message":"Podaj 3 zalety Pythona."}'

curl -X POST 'http://127.0.0.1:8000/chat' \
-H 'Content-Type: application/json' \
-H 'X-Model: openai:gpt-4o-mini' \
-d '{"message":"Podaj 3 zalety Pythona."}'

The future of GenAI

That brings us to the second part of this episode: the future of GenAI.

How will this industry look over the next few years? Nobody has a crystal ball — but some trends are already very clear.

Trend #1: Multimodality

Models like GPT-5 or Claude 4.5 can already analyze images, audio, and video. Soon this will be standard.

When you build applications, you have to assume users won’t send only text. They will upload screenshots, photos of documents, audio recordings. Your architecture needs to be ready for that.

Trend #2: Agentic workflows

Classic APIs and linear workflows are not enough when a process is complex and dynamic.

Instead of hardcoding conditions in traditional code, we’ll declare state graphs of agents: Researcher, Critic, Expert — and let the system iterate based on state and quality signals.

Keeping these trends in mind, we can prepare our applications for the next generation of even more capable AI models.

That’s all int this chapter dedicated to model-agnostic pattern, LLM API gateway and future AI trends.

see next chapter

see previous chapter

see the full code from this article in the GitHub repository

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.