
Introduction to RAG: Basics to Mastery. 4-RAG with MCP- The Future of Dynamic Context Retrieval
Last Updated on September 4, 2025 by Editorial Team
Author(s): Taha Azizi
Originally published on Towards AI.
Part 4 of the mini-series introduction to RAG

Introduction
So far in this series, we’ve explored:
- Basic RAG with local semantic search.
- Hybrid RAG combining keyword + semantic search.
- Agentic RAG with multi-step reasoning and tool use.
In this article, we’ll dive into something cutting-edge:
RAG powered by MCP (Model Context Protocol).
MCP is an emerging standard that allows LLMs to dynamically fetch context during generation rather than preloading everything at the start. In practice, this means the model can realize “I need more info” mid-answer, call a retrieval tool, and continue naturally.
Think of MCP as the glue that connects RAG tools (retrievers, search APIs, calculators, etc.) with an agentic system in a standardized way.
Theory
The Model Context Protocol (MCP) is a modern, open-source standard (introduced by Anthropic in November 2024) that standardizes how large language models (LLMs) dynamically access external tools and data sources. Acting much like a “USB-C port for AI,” MCP enables any LLM-based agent to invoke tools — such as document retrievers, APIs, or calculators — through a unified client-server interface using JSON-RPC, regardless of the underlying system. This reduces integration complexity, tackles the “M×N connector problem,” and ensures secure, scalable access to context during generation rather than all upfront. Let’s explore the differences between previous sessions:
Traditional RAG pipeline:
Retrieve → Inject into prompt → Generate
Agentic RAG pipeline (from Part 3):
Plan → Retrieve (possibly multiple times) → Reason → Answer
MCP-powered RAG pipeline:
Generate → Realize missing info → Call MCP tool (retrieval, API, calculator) → Continue generation
Benefits of MCP:
- Dynamic retrieval: retrieval happens “mid-thought” rather than upfront.
- Lower memory/GPU usage: no need to preload large context.
- Tool standardization: all tools (retrievers, APIs, calculators) are accessed through the same protocol.
- Multi-turn friendly: works better in long conversations.
Setup
We’ll use:
- Flask for server deployment
- httpx for making asynchronous HTTP requests
- fastmcp for structured context and tool integration
Install:
pip install flask httpx fastmcp
Step-by-Step Code
Step 1: Define MCP Server
Our MCP server exposes RAG retrieval and calculator as MCP tools.
from mcp.server.fastmcp import FastMCP
from utils.retrieval import hybrid_search
from utils.generation import generate_answer
import math
mcp = FastMCP(
name="rag_mcp_server",
version="1.0.0",
description="MCP server exposing RAG retriever and calculator tools"
)
@mcp.tool()
def rag_retrieve(query: str) -> str:
"""Retrieve relevant context using hybrid RAG."""
docs = hybrid_search(query, top_k=3)
return "\n---\n".join(docs)
@mcp.tool()
def calculator(expression: str) -> str:
"""Safe calculator for arithmetic expressions."""
try:
return str(eval(expression, {"__builtins__": {}}, {"math": math}))
except Exception as e:
return f"CALC_ERROR: {e}"
if __name__ == "__main__":
mcp.run()
Step 2: MCP Client
The client talks to the server and lets the agent call MCP tools.
from mcp.client import MCPClient
client = MCPClient("http://localhost:8000") # server address
# Example: call RAG tool
print(client.call_tool("rag_retrieve", {"query": "Germany renewable policies"}))
# Example: call calculator tool
print(client.call_tool("calculator", {"expression": "25*4+10"}))
Step 3: Agent with MCP Integration
We integrate the MCP client into our agent.
from langchain.agents import initialize_agent, AgentType
from langchain.llms import Ollama
from langchain.tools import Tool
from mcp_client import client
# Wrap MCP tools for LangChain
retrieval_tool = Tool(
name="MCP-RAG",
func=lambda q: client.call_tool("rag_retrieve", {"query": q}),
description="Retrieve info from RAG via MCP"
)
calc_tool = Tool(
name="MCP-Calculator",
func=lambda e: client.call_tool("calculator", {"expression": e}),
description="Do arithmetic via MCP"
)
llm = Ollama(model="mistral")
agent = initialize_agent(
tools=[retrieval_tool, calc_tool],
llm=llm,
agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
verbose=True
)
query = "If Germany’s renewable energy sector was 250 TWh in 2023 and is projected to grow 10% per year, what will it reach by 2030, and how does this compare with Germany’s official renewable energy targets?"
print(agent.run(query))
Expected Behavior
- The agent reads the query.
- It realizes it needs a calculation → calls MCP Calculator.
- It realizes it needs document retrieval → calls MCP RAG tool.
- It integrates both results and generates the final answer.
Thought: I need to calculate Germany’s renewable energy output in 2030,
given 250 TWh in 2023 with 10% annual growth.
Action: MCP-Calculator
Action Input: 250 * (1.1 ** 7)
Observation: 487.57
Thought: I should check Germany’s official renewable energy targets for 2030.
Action: MCP-RAG
Action Input: Germany renewable energy 2030 targets
Observation: Germany aims for ~80% renewable share in electricity by 2030.
Final Answer: At 10% annual growth, Germany’s renewable sector would reach ~488 TWh by 2030.
This aligns with Germany’s official goal of ~80% renewables in the electricity mix.
Why This Matters
MCP makes RAG dynamic and extensible:
- Tools (retrievers, calculators, APIs) can be registered in a standardized way.
- Agents don’t preload all knowledge — they fetch when needed.
- Your system scales: add a search API or weather API, just expose it via MCP.
This is the natural evolution of RAG: from static retrieval → agentic planning → protocol-driven, dynamic retrieval.
Next Steps
In Article 5, we’ll push performance to the max with Advanced RAG using Approximate Nearest Neighbors (ANN) — making retrieval lightning-fast even with millions of documents.
Visit the Github page: https://github.com/Taha-azizi/RAG
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.