
Build a FREE, Local AI Research Agent with Python
Author(s): Taha Azizi
Originally published on Towards AI.
Give your local LLM the power to browse the real-time web. No APIs, no fees — just Python, LangChain, and your own PC.

Ever asked a cutting-edge LLM about a recent event, only to get the familiar apology: “My knowledge cutoff is…”? It’s the digital equivalent of a super-genius locked in a library with last year’s newspapers. Powerful, but not current.
In a previous article, we explored how to build a personal AI brain to chat with existing LLM knowledge base which does not have recent updates. Today, we’re taking the next logical step: giving that brain eyes and ears to the live internet. We’re going to build a basic but functional AI Research Agent that can browse the web to answer your questions — and we’ll do it entirely for free, right on your local machine.
This project proves that many of the powerful features you see in top-tier AI products can be replicated at home, giving you ultimate control, customizability and privacy.
Why Bother? The Case for Real-Time Knowledge
The single biggest limitation of most standard Large Language Models (LLMs) is their static nature. They are trained on a massive but finite snapshot of the internet. An AI agent, however, is different. It’s an LLM-powered system that can take actions using tools.
Our agent will operate on a simple but powerful loop:
- Receive a question: “What are the latest developments in AI-powered drug discovery?”
- Formulate a plan: “I need to search for recent news and research papers on this topic.”
- Use a tool: Execute a web search.
- Observe the result: Get a list of URLs.
- Use another tool: Scrape the content from the most promising URL.
- Synthesize and repeat: Read the content, decide if more information is needed, and either scrape another source or finish the task.
This turns your LLM from a static encyclopedia into a dynamic researcher.

The Architectural Blueprint
We’ll build our agent using a stack of incredible, free, open-source libraries.
- The Brain (Local LLM): We’ll use Ollama to run a powerful open-source model like
gemma3:27b
locally. Ollama makes running state-of-the-art models on your own hardware incredibly simple. - The Conductor (Agent Framework): LangChain provides the core logic. We’ll use its
create_react_agent
function, which implements the "ReAct" (Reason + Act) framework, allowing the model to think through its steps. - The Eyes (Search Tool): The
duckduckgo-search
library gives us a simple, no-API-key-required tool to perform web searches. - The Hands (Scraping Tool): We’ll use
requests
andBeautifulSoup4
to fetch and parse the text content from web pages. - The Scribe (Report Generator): Finally,
fpdf2
will take our agent's findings and automatically generate a clean PDF report.

The Code: Crafting the Agent’s Mind
While the full code is available for you to run, let’s focus on the two most critical parts: the tools and the agent’s “mind” — the prompt that tells it how to behave.
1. Defining the Agent’s Tools
First, we need to give our agent its capabilities. We create two primary tools: Search
and Scrape
.
def ddgs_search_tool(query: str) -> str:
"""Performs a DuckDuckGo search and returns a list of URLs."""
print(f"[Custom Search]: Searching for '{query}'...")
# ... (logic using the DDGS library) ...
return "\n".join(unique_urls)
def _fetch_web_content_for_tool(url: str) -> str:
"""Fetches the main text from a URL."""
print(f"\n[Tool Action]: Fetching content from URL: {url}")
# ... (logic using Requests and BeautifulSoup) ...
return text
# We wrap these functions in LangChain's 'Tool' class
tools = [
Tool(
name="Search",
func=ddgs_search_tool,
description="Use this to find URLs on the internet..."
),
Tool(
name="Scrape",
func=_fetch_web_content_for_tool,
description="Use this to fetch the text content of a URL..."
)
]# A simplified view of our tool definitions
These functions are the agent’s hands. It doesn’t know how they work, only what they do based on the description
.
2. The Agent’s Core Prompt
This is the most crucial part. We’re not just asking the LLM a question; we are giving it a persona, a goal, a set of rules, and a memory. The prompt is the agent’s operating system.
**Research State:**
- Question: {input}
- Sources Found: {sources_found_count} out of {min_sources_required}
- Visited URLs: {visited_urls}**Your Task:**
1. Start with a 'Search' to find relevant articles.
2. Review search results and use 'Scrape' on the most promising URL.
3. **Critically analyze the result of the Scrape tool:**
- If the content is good, you've found a source.
- If you get an error or the content is unsuitable, **immediately discard that URL**.
4. If your initial search yields no good URLs, **formulate a new, different search query**.
5. Once you have gathered {min_sources_required} high-quality sources, your final output must be `FINISH`.**Tools Available:**
{tools}**Previous Steps (Log):**
{agent_scratchpad}**Your Next Step:**
Thought: Your reasoning for the next action.
Action: The tool to use...
Action Input: The input for the selected tool.
"""
By providing state variables like sources_found_count
and visited_urls
directly in the prompt, we give the agent memory. It learns from its mistakes (like trying a broken link) and knows how close it is to completing its goal.
3. The Execution Loop
Finally, a custom Python loop runs the agent, manages its state (like the list of visited URLs and accumulated data), and executes the tools the agent decides to use. When the agent has gathered enough sources, the loop stops, synthesizes the findings, and generates the final PDF report.
Conclusion: Your Personal AI Powerhouse
We’ve successfully built a system that empowers a local LLM to perform real-time research, breaking it free from its static knowledge base. You now have a blueprint for an agent that can:
✅ Answer questions with up-to-the-minute information.
✅ Synthesize data from multiple online sources.
✅ Generate a formatted report of its findings.
✅ Run entirely on your own machine, for free.
This project is just the beginning. You could expand it with more advanced tools, more sophisticated state management, or even have multiple agents collaborate. You’ve taken a significant step from being a mere user of AI to becoming a builder of bespoke AI solutions.

Please find the complete code components at my Github repository:
https://github.com/Taha-azizi/researchgen
All images were created by the author using AI image creation tools.
Disclaimers
A Note on Performance
This is a demonstration of what’s possible with free, local tools. The performance of a model like gemma3:27b
is impressive for its size, but it will not match the speed or reasoning capabilities of massive, proprietary models like GPT-4-Turbo. The agent might occasionally get stuck in a loop or choose a less-than-optimal tool. This project is a trade-off: you exchange peak performance for 100% privacy, zero cost, and infinite customizability.
Legal and Ethical Notice
The scraping tool in this project is for educational use only. Automated scraping may violate a website’s terms of service and place unnecessary load on its servers. Users must check robots.txt
and ensure their use complies with applicable laws and regulations. The project maintainers are not responsible for any misuse or legal consequences arising from the use of this tool.
From an ethical standpoint, users should respect content ownership, avoid scraping paywalled or sensitive information, and refrain from using scraped data for misleading or harmful purposes. Responsible use is expected.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.