Build a FREE, Local AI Research Agent with Python

Author(s): Taha Azizi

Originally published on Towards AI.

Give your local LLM the power to browse the real-time web. No APIs, no fees — just Python, LangChain, and your own PC.

Build a FREE, Local AI Research Agent with Python — With **Researchgen**, the web is your research library. This AI agent efficiently scours online sources and generates custom reports, all shaped by your specific input. You’re in control of the output.

Ever asked a cutting-edge LLM about a recent event, only to get the familiar apology: “My knowledge cutoff is…”? It’s the digital equivalent of a super-genius locked in a library with last year’s newspapers. Powerful, but not current.

In a previous article, we explored how to build a personal AI brain to chat with existing LLM knowledge base which does not have recent updates. Today, we’re taking the next logical step: giving that brain eyes and ears to the live internet. We’re going to build a basic but functional AI Research Agent that can browse the web to answer your questions — and we’ll do it entirely for free, right on your local machine.

This project proves that many of the powerful features you see in top-tier AI products can be replicated at home, giving you ultimate control, customizability and privacy.

Why Bother? The Case for Real-Time Knowledge

The single biggest limitation of most standard Large Language Models (LLMs) is their static nature. They are trained on a massive but finite snapshot of the internet. An AI agent, however, is different. It’s an LLM-powered system that can take actions using tools.

Our agent will operate on a simple but powerful loop:

Receive a question: “What are the latest developments in AI-powered drug discovery?”
Formulate a plan: “I need to search for recent news and research papers on this topic.”
Use a tool: Execute a web search.
Observe the result: Get a list of URLs.
Use another tool: Scrape the content from the most promising URL.
Synthesize and repeat: Read the content, decide if more information is needed, and either scrape another source or finish the task.

This turns your LLM from a static encyclopedia into a dynamic researcher.

The Architectural Blueprint

We’ll build our agent using a stack of incredible, free, open-source libraries.

The Brain (Local LLM): We’ll use Ollama to run a powerful open-source model like gemma3:27b locally. Ollama makes running state-of-the-art models on your own hardware incredibly simple.
The Conductor (Agent Framework): LangChain provides the core logic. We’ll use its create_react_agent function, which implements the "ReAct" (Reason + Act) framework, allowing the model to think through its steps.
The Eyes (Search Tool): The duckduckgo-search library gives us a simple, no-API-key-required tool to perform web searches.
The Hands (Scraping Tool): We’ll use requests and BeautifulSoup4 to fetch and parse the text content from web pages.
The Scribe (Report Generator): Finally, fpdf2 will take our agent's findings and automatically generate a clean PDF report.

The Code: Crafting the Agent’s Mind

While the full code is available for you to run, let’s focus on the two most critical parts: the tools and the agent’s “mind” — the prompt that tells it how to behave.

1. Defining the Agent’s Tools

First, we need to give our agent its capabilities. We create two primary tools: Search and Scrape.

def ddgs_search_tool(query: str) -> str:
 """Performs a DuckDuckGo search and returns a list of URLs."""
 print(f"[Custom Search]: Searching for '{query}'...")
 # ... (logic using the DDGS library) ...
 return "\n".join(unique_urls)
def _fetch_web_content_for_tool(url: str) -> str:
 """Fetches the main text from a URL."""
 print(f"\n[Tool Action]: Fetching content from URL: {url}")
 # ... (logic using Requests and BeautifulSoup) ...
 return text
# We wrap these functions in LangChain's 'Tool' class
tools = [
 Tool(
 name="Search",
 func=ddgs_search_tool,
 description="Use this to find URLs on the internet..."
 ),
 Tool(
 name="Scrape",
 func=_fetch_web_content_for_tool,
 description="Use this to fetch the text content of a URL..."
 )
]# A simplified view of our tool definitions

These functions are the agent’s hands. It doesn’t know how they work, only what they do based on the description.

2. The Agent’s Core Prompt

This is the most crucial part. We’re not just asking the LLM a question; we are giving it a persona, a goal, a set of rules, and a memory. The prompt is the agent’s operating system.

**Research State:**
- Question: {input}
- Sources Found: {sources_found_count} out of {min_sources_required}
- Visited URLs: {visited_urls}**Your Task:**
1. Start with a 'Search' to find relevant articles.
2. Review search results and use 'Scrape' on the most promising URL.
3. **Critically analyze the result of the Scrape tool:**
 - If the content is good, you've found a source.
 - If you get an error or the content is unsuitable, **immediately discard that URL**.
4. If your initial search yields no good URLs, **formulate a new, different search query**.
5. Once you have gathered {min_sources_required} high-quality sources, your final output must be `FINISH`.**Tools Available:**
{tools}**Previous Steps (Log):**
{agent_scratchpad}**Your Next Step:**
Thought: Your reasoning for the next action.
Action: The tool to use...
Action Input: The input for the selected tool.
"""

By providing state variables like sources_found_count and visited_urls directly in the prompt, we give the agent memory. It learns from its mistakes (like trying a broken link) and knows how close it is to completing its goal.

3. The Execution Loop

Finally, a custom Python loop runs the agent, manages its state (like the list of visited URLs and accumulated data), and executes the tools the agent decides to use. When the agent has gathered enough sources, the loop stops, synthesizes the findings, and generates the final PDF report.

Conclusion: Your Personal AI Powerhouse

We’ve successfully built a system that empowers a local LLM to perform real-time research, breaking it free from its static knowledge base. You now have a blueprint for an agent that can:

✅ Answer questions with up-to-the-minute information.

✅ Synthesize data from multiple online sources.

✅ Generate a formatted report of its findings.

✅ Run entirely on your own machine, for free.

This project is just the beginning. You could expand it with more advanced tools, more sophisticated state management, or even have multiple agents collaborate. You’ve taken a significant step from being a mere user of AI to becoming a builder of bespoke AI solutions.

**Researchgen**, your local AI agent, streamlines information gathering and report creation. It autonomously navigates research steps and produces fully **customizable reports** tailored to your exact specifications, all without leaving your local environment.

Please find the complete code components at my Github repository:

https://github.com/Taha-azizi/researchgen

All images were created by the author using AI image creation tools.

Disclaimers

A Note on Performance

This is a demonstration of what’s possible with free, local tools. The performance of a model like gemma3:27b is impressive for its size, but it will not match the speed or reasoning capabilities of massive, proprietary models like GPT-4-Turbo. The agent might occasionally get stuck in a loop or choose a less-than-optimal tool. This project is a trade-off: you exchange peak performance for 100% privacy, zero cost, and infinite customizability.

Legal and Ethical Notice

The scraping tool in this project is for educational use only. Automated scraping may violate a website’s terms of service and place unnecessary load on its servers. Users must check robots.txt and ensure their use complies with applicable laws and regulations. The project maintainers are not responsible for any misuse or legal consequences arising from the use of this tool.

From an ethical standpoint, users should respect content ownership, avoid scraping paywalled or sensitive information, and refrain from using scraped data for misleading or harmful purposes. Responsible use is expected.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Build a FREE, Local AI Research Agent with Python

Author(s): Taha Azizi

Give your local LLM the power to browse the real-time web. No APIs, no fees — just Python, LangChain, and your own PC.

Why Bother? The Case for Real-Time Knowledge

The Architectural Blueprint

The Code: Crafting the Agent’s Mind

1. Defining the Agent’s Tools

2. The Agent’s Core Prompt

3. The Execution Loop

Conclusion: Your Personal AI Powerhouse

Disclaimers

A Note on Performance

Legal and Ethical Notice

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Build a FREE, Local AI Research Agent with Python

Author(s): Taha Azizi

Give your local LLM the power to browse the real-time web. No APIs, no fees — just Python, LangChain, and your own PC.

Why Bother? The Case for Real-Time Knowledge

The Architectural Blueprint

The Code: Crafting the Agent’s Mind

1. Defining the Agent’s Tools

2. The Agent’s Core Prompt

3. The Execution Loop

Conclusion: Your Personal AI Powerhouse

Disclaimers

A Note on Performance

Legal and Ethical Notice

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement