Where LLMs Belong in Agentic Systems: Gating, Approval, and Human-in-the-Loop Design

Last Updated on February 19, 2026 by Editorial Team

Author(s): Vahe Sahakyan

Originally published on Towards AI.

Where LLMs Belong in Agentic Systems: Gating, Approval, and Human-in-the-Loop Design

This article closes a four-part series on designing agentic AI systems.

So far, we’ve focused on structure first.

We separated agentic behavior from language models.
We built workflows with explicit control flow, shared state, and clear termination.
We showed that agency emerges from structure before intelligence.

That sequencing was intentional.

Because the most fragile moment in an agentic system is not when you remove the LLM.

It is when you add it back.

Reintroducing a language model into a system with shared state, explicit routing, and real consequences immediately changes the system’s risk profile. Without clear boundaries, probabilistic reasoning begins to leak into control flow. Decisions blur. Execution becomes harder to inspect. Responsibility shifts silently into prompts.

The question is not whether LLMs are powerful.

It is where they belong.

In this final article, we focus on three structural patterns:

gating LLM invocation behind deterministic routing decisions
modeling human approval as part of execution flow rather than an external process
designing workflows where language models assist reasoning without determining control flow

The goal is not to make agents more autonomous.

It is to ensure that once LLMs are introduced, systems remain predictable, inspectable, and accountable.

This is the boundary between impressive demos and durable systems.

The code fragments below are intentionally partial. They illustrate architectural boundaries, not a production-ready tutorial. The goal is to make execution structure visible — not to walk step-by-step through implementation details.

When LLMs Collapse Control Boundaries

So far in this series, we have intentionally kept language models outside the execution path.

That choice was deliberate. Before introducing probabilistic reasoning, we needed a system whose control flow, state, and decision boundaries were already explicit.

Once an LLM is introduced, those boundaries are easy to blur.

Language models are powerful — but they are probabilistic. They do not execute rules. They do not enforce constraints. They generate plausible continuations. When a model is asked to both reason and determine what happens next, the distinction between thinking and acting begins to collapse.

A common early failure mode in agentic systems is allowing the model to implicitly govern execution. The LLM interprets intent, selects tools, decides when to retry, and determines when the task is complete — all within generated text. Control logic migrates into prompts. Branching, validation, and termination cease to be structural properties of the system and become inferences drawn from output.

The result is not only hallucinated content.

It is hallucinated control flow.

Decisions become opaque. Failures are difficult to reproduce. Debugging becomes an exercise in reconstructing prompt history. Responsibility shifts from architecture to model output — where it cannot be enforced or inspected.

This is why generation must be gated.

Language models are valuable as classifiers, planners, and generators. But they should not determine whether execution proceeds. Their outputs must pass through explicit decision points. And when actions have real consequences, humans must be placed at those boundaries — not after the fact.

The objective is not to constrain the model.

It is to preserve what makes agentic systems reliable: explicit control, inspectable decisions, and accountable execution.

Pattern 1: Decision Before Generation

The first safe way to introduce an LLM into an agentic system is to decide — structurally — whether generation is allowed before any generation occurs.

This pattern can be described simply:

The system decides first.
The model acts second.

Instead of asking the LLM what to do next, the workflow evaluates whether invoking the model is justified at all. That decision is deterministic. It is made through state and routing logic — not through generated text.

This separation immediately addresses two common failure modes:

hallucinating answers when information is missing or irrelevant
hiding control logic inside prompts instead of architecture

The goal is not to make the model cautious.

It is to make invocation conditional.

Problem setup: a scoped Q&A assistant

Consider a documentation assistant with a narrow responsibility:

It should answer questions covered by the tutorial.
It should decline questions outside its scope.
It should never guess.

Without structural gating, an LLM will almost always produce an answer — even when it shouldn’t. That behavior is not a bug. It is how generative models are designed.

So the workflow must answer a different question first:

Is the system justified in attempting generation?

Only after that question is answered should the model be invoked.

High-level workflow

Conceptually, the workflow contains four steps:

Receive a question
Determine whether it is in scope
Generate an answer (only if allowed)
Return a controlled response

The key architectural constraint is this:

The LLM is never responsible for deciding whether it should be used.

That decision belongs to the system.

Explicit state

State captures responsibility explicitly. Each field has a single role.

class QAState(TypedDict, total=False):
 question: str
 is_in_scope: bool
 answer: Optional[str]

This separation matters:

question is input
is_in_scope is a deterministic decision
answer is the result of one of two controlled paths

Intake Node

This node performs no reasoning. It simply ensures that a question exists in state.

def input_question_node(state: QAState) -> QAState:
 if "question" not in state:
 question = input("Ask a question about the tutorial: ").strip()
 return {"question": question}
 return {}

There is no interpretation. No classification. No model involvement.

Only state initialization.

Deterministic Scope Check

The scope check is intentionally rule-based.

In production, this might involve embeddings or a classifier. Here, a simple keyword rule keeps behavior transparent and reproducible.

ALLOWED_TOPICS = {
 "langgraph",
 "state",
 "nodes",
 "edges",
 "workflow",
 "agent",
}

def scope_check_node(state: QAState) -> QAState:
 question = state["question"].lower()
 in_scope = any(topic in question for topic in ALLOWED_TOPICS)
 return {"is_in_scope": in_scope}

This node answers a single structural question:

Is generation permitted?

Explicit Routing

Routing is separate from execution.

def route_after_scope_check(state: QAState) -> str:
 if state.get("is_in_scope"):
 return "answer"
 return "out_of_scope"

By the time routing occurs, the system has already decided whether the LLM may be invoked.

The model has not been consulted.

Isolated Generation

This is the only location where the LLM appears.

def answer_node(state: QAState) -> QAState:
 question = state["question"]

 response = llm.invoke([
 HumanMessage(content=f"Answer clearly and concisely:\n\n{question}")
 ])

 return {"answer": response.content}

The model does not determine whether it should answer.
It answers because the system has granted permission.

That distinction is architectural.

Out-of-Scope Path

Out-of-scope behavior is not implemented through clever prompting.

It is implemented through an explicit alternative node.

def out_of_scope_node(state: QAState) -> QAState:
 return {
 "answer": (
 "This question is outside the scope of the tutorial. "
 "Please ask something related to LangGraph workflows, state, or agents."
 )
 }

This guarantees predictable behavior when generation is not allowed.

No refusals embedded in prompts.
No reliance on model compliance.

Converging Execution

Both paths — generation and fallback — converge before termination.

def finalize_qa_node(state: QAState) -> QAState:
 return {"answer": state.get("answer", "No answer available.")}

This node does not introduce new logic.
It normalizes output and provides a single structural exit point.

Convergence matters. It ensures that no matter which path execution takes, termination is explicit and inspectable.

Build the graph

With nodes defined, the graph encodes the boundary:

from langgraph.graph import StateGraph, END

workflow = StateGraph(QAState)

workflow.add_node("input", input_question_node)
workflow.add_node("scope_check", scope_check_node)
workflow.add_node("generate_answer", answer_node)
workflow.add_node("out_of_scope", out_of_scope_node)
workflow.add_node("finalize", finalize_qa_node)

workflow.set_entry_point("input")
workflow.add_edge("input", "scope_check")

workflow.add_conditional_edges(
 "scope_check",
 route_after_scope_check,
 {
 "answer": "generate_answer",
 "out_of_scope": "out_of_scope",
 },
)

workflow.add_edge("generate_answer", "finalize")
workflow.add_edge("out_of_scope", "finalize")
workflow.add_edge("finalize", END)

app = workflow.compile()

The graph itself enforces the rule:

There is no path to the LLM without passing the scope check.
Out-of-scope questions never reach the model.

This guarantee is not expressed in prompts.
It is encoded in execution structure.

Visualizing the workflow

Execution can be visualized directly:

png_bytes = app.get_graph().draw_mermaid_png()
display(Image(png_bytes))

You can see exactly where generation is allowed — and where it is not.

Running the workflow

Invoking the workflow:

app.invoke({"question": "How does LangGraph manage state across nodes?"})

Out:
{'question': 'How does LangGraph manage state across nodes?',
 'is_in_scope': True,
 'answer': 'LangGraph manages state across nodes by maintaining a centralized or distributed state store that tracks the data and context as it flows through each node. Each node can read from and write to this shared state, enabling consistent data management and coordination throughout the graph execution. This approach ensures that nodes have access to the necessary information to perform their tasks and that updates are propagated appropriately across the graph.'}

Why This Pattern Matters in Production

This example demonstrates a core agentic principle:

LLMs should be workers, not judges.

They generate content when invoked — but they do not determine whether invocation is allowed.

By separating deterministic decision-making (the scope check) from probabilistic generation (the LLM), the system retains authority over its own execution.

That separation has concrete production consequences.

Generation cannot bypass validation or policy because the model never controls routing. Decisions remain explicit and inspectable because they live in state and graph structure. New rules can be introduced without rewriting prompts or retraining behavior, because control logic is architectural rather than textual.

Most importantly, responsibility stays where it belongs: in the system design, not in model output.

In the next section, we extend this boundary further by introducing explicit approval and human-in-the-loop checkpoints. Instead of allowing oversight to exist outside the system, we will model it directly inside execution flow — where it can be enforced rather than assumed.

Pattern 2: Human-in-the-Loop as a First-Class Node

Not all agentic systems should be fully autonomous.

Even when an agent can evaluate options or propose actions, many real-world systems require explicit human approval before irreversible steps occur. That requirement is not a limitation of the model. It is a property of the domain.

Common examples include:

compliance and regulatory decisions
financial or pricing changes
actions with legal or reputational risk

In these settings, human intervention cannot be bolted on after execution. It must be modeled as part of execution.

This leads to a second structural pattern:

Compute a proposal → pause for approval → either apply or reject.

In agentic systems, human-in-the-loop is not a UI concern.

It is a control-flow concern.

Problem setup: discount approval workflow

Imagine a restaurant chain where a manager proposes a discount for a specific day.

The system must:

evaluate the proposed discount
pause for human approval
apply the discount if approved
reject it otherwise
produce a final, auditable result

The critical constraint is structural:

Execution must stop until approval is resolved.

Explicit approval state

Approval is modeled directly in shared state.

class DiscountState(TypedDict, total=False):
 day: str
 discount_pct: float
 estimated_revenue: float
 estimated_discount_cost: float
 approval: Optional[str]
 result: Optional[str]

The presence of the approval field makes the checkpoint explicit.

Execution cannot continue without it.

Approval is not implied.
It is required.

Node 1: input proposal

def input_discount_node(state: DiscountState) -> DiscountState:
 if "day" not in state:
 day = input("Day (Mon/Tue/Wed/Thur/Fri/Sat/Sun): ").strip()
 else:
 day = state["day"]

 if "discount_pct" not in state:
 discount_pct = float(input("Discount percent (e.g., 10 for 10%): ").strip())
 else:
 discount_pct = float(state["discount_pct"])

 return {"day": day, "discount_pct": discount_pct}

This node gathers the proposal.

It does not decide whether it should be applied.

Node 2: estimate impact

Before asking for approval, the system computes impact.

BASELINE_REVENUE_BY_DAY = {
 "Mon": 1200, "Tue": 1300, "Wed": 1400, "Thur": 1600,
 "Fri": 2200, "Sat": 3000, "Sun": 2800
}

def estimate_impact_node(state: DiscountState) -> DiscountState:
 day = state["day"]
 discount_pct = state["discount_pct"]

 revenue = BASELINE_REVENUE_BY_DAY.get(day, 1500)
 discount_cost = revenue * (discount_pct / 100.0)

 return {
 "estimated_revenue": revenue,
 "estimated_discount_cost": discount_cost,
 }

This separation matters.

The human reviews a proposal with context, not a raw suggestion.

Node 3: human approval checkpoint

This node represents the human-in-the-loop step.

def approval_node(state: DiscountState) -> DiscountState:
 if state.get("approval") in {"approve", "reject"}:
 return {}

 decision = input("Approve discount? (approve/reject): ").strip().lower()
 if decision not in {"approve", "reject"}:
 decision = "reject"

 return {"approval": decision}

Router: approval decides execution

def route_after_approval(state: DiscountState) -> str:
 return "apply" if state.get("approval") == "approve" else "reject"

Routing is explicit and inspectable.

Execution cannot proceed without a resolved approval state.

Node 4a: Apply Discount

def apply_discount_node(state: DiscountState) -> DiscountState:
 day = state["day"]
 pct = state["discount_pct"]
 cost = state["estimated_discount_cost"]
 return {
 "result": (
 f"✅ Approved: Apply {pct:.1f}% discount on {day}. "
 f"Estimated revenue impact: -${cost:.2f}."
 )
 }

Node 4b: Reject Discount

def reject_discount_node(state: DiscountState) -> DiscountState:
 day = state["day"]
 pct = state["discount_pct"]
 cost = state["estimated_discount_cost"]
 return {
 "result": (
 f"❌ Rejected: Do not apply {pct:.1f}% discount on {day}. "
 f"Estimated revenue impact avoided: ${cost:.2f}."
 )
 }

These nodes execute outcomes.

They do not decide whether approval exists.

That decision has already been enforced structurally.

Build the graph

from langgraph.graph import StateGraph, END

workflow = StateGraph(DiscountState)

workflow.add_node("input", input_discount_node)
workflow.add_node("estimate", estimate_impact_node)
workflow.add_node("approved", approval_node)
workflow.add_node("apply_discount", apply_discount_node)
workflow.add_node("reject_discount", reject_discount_node)

workflow.set_entry_point("input")
workflow.add_edge("input", "estimate")
workflow.add_edge("estimate", "approved")

workflow.add_conditional_edges(
 "approved",
 route_after_approval,
 {
 "apply": "apply_discount",
 "reject": "reject_discount",
 },
)

workflow.add_edge("apply_discount", END)
workflow.add_edge("reject_discount", END)

app = workflow.compile()

The graph encodes a structural invariant:

Execution cannot progress without explicit human approval.

Visualizing execution

The diagram makes the pause visible.

Approval is not implied.
It is a required state transition.

Running the workflow

final_state = app.invoke({"day": "Sat", "discount_pct": 12})
print(final_state["result"])

Approve discount? (approve/reject): approve


✅ Approved: Apply 12.0% discount on Sat. Estimated revenue impact: -$360.00.

Why Human-in-the-Loop Is Structural, Not Cosmetic

This pattern highlights a subtle but critical point:

Agentic systems are not about removing humans.
They are about placing humans at enforceable boundaries.

In poorly designed systems, human oversight appears after execution — through manual overrides, approval emails, or post-hoc reviews. These mechanisms may exist socially, but they are not enforced by the system itself.

When human approval is modeled explicitly in state and routing:

workflows remain transparent
responsibility boundaries are visible
compliance is enforced by execution structure

The difference is architectural.

When approval is part of shared state, the system cannot proceed without it. There is no prompt to bypass, no hidden conditional to forget, and no ambiguity about who authorized what.

Human judgment becomes a first-class dependency of execution.

This is where graph-based orchestration matters.

By modeling human checkpoints as nodes — subject to the same routing, inspection, and observability as any other step — approval becomes part of the system’s lifecycle rather than an external intervention.

Execution pauses cleanly.
Decisions are inspectable.
Responsibility is explicit.

In agentic systems, autonomy is not defined by the absence of humans.

It is defined by whether the system knows exactly when it must stop and ask.

Controlling LLMs Is an Architectural Problem

Across this series, we’ve moved deliberately — from models to systems, from generation to execution, and from clever prompts to explicit control.

The conclusion is simple, but easy to miss:

The hardest problems in agentic systems are not generative.
They are architectural.

Large language models are powerful reasoning tools. But without structure, they blur responsibility, hide control logic, and turn execution into improvisation. With structure, the same models become contained components inside systems that remain predictable, inspectable, and accountable.

The patterns in this article are not about making LLMs safer through better prompts. They are about making systems safer by design.

Decision-before-generation ensures that models are only invoked when the system has determined it is appropriate.

Human-in-the-loop as a first-class node ensures that oversight is enforced by execution flow, not by social convention.

Explicit state and routing ensure that no model output silently becomes control logic.

Taken together, these patterns point to a single rule:

LLMs may propose, explain, classify, or summarize — but they should not decide when execution proceeds.

That responsibility belongs to the system.

This reframing changes how agentic systems are built. Instead of asking:

“How do we get the model to behave?”

The question becomes:

“Where does probabilistic reasoning belong inside a deterministic process?”

That shift has been the throughline of this series.

Agentic systems are not defined by autonomy for its own sake.
They are defined by controlled autonomy — by systems that know when to act, when to pause, and when to ask.

Structure first.
Reasoning second.
Execution always explicit.

Once that boundary is clear, LLMs stop being a liability.

They become exactly what they should have been all along:

powerful tools inside systems that remain firmly under control.

If you liked this content, follow me for more.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Where LLMs Belong in Agentic Systems: Gating, Approval, and Human-in-the-Loop Design

Author(s): Vahe Sahakyan

When LLMs Collapse Control Boundaries

Pattern 1: Decision Before Generation

Problem setup: a scoped Q&A assistant

High-level workflow

Explicit state

Intake Node

Deterministic Scope Check

Explicit Routing

Isolated Generation

Out-of-Scope Path

Converging Execution

Build the graph

Visualizing the workflow

Running the workflow

Why This Pattern Matters in Production

Pattern 2: Human-in-the-Loop as a First-Class Node

Problem setup: discount approval workflow

Explicit approval state

Node 1: input proposal

Node 2: estimate impact

Node 3: human approval checkpoint

Router: approval decides execution

Node 4a: Apply Discount

Node 4b: Reject Discount

Build the graph

Visualizing execution

Running the workflow

Why Human-in-the-Loop Is Structural, Not Cosmetic

Controlling LLMs Is an Architectural Problem

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement