Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Build a FREE, Local AI Research Agent with Python
Latest   Machine Learning

Build a FREE, Local AI Research Agent with Python

Author(s): Taha Azizi

Originally published on Towards AI.

Give your local LLM the power to browse the real-time web. No APIs, no fees — just Python, LangChain, and your own PC.

Build a FREE, Local AI Research Agent with Python
With Researchgen, the web is your research library. This AI agent efficiently scours online sources and generates custom reports, all shaped by your specific input. You’re in control of the output.

Ever asked a cutting-edge LLM about a recent event, only to get the familiar apology: “My knowledge cutoff is…”? It’s the digital equivalent of a super-genius locked in a library with last year’s newspapers. Powerful, but not current.

In a previous article, we explored how to build a personal AI brain to chat with existing LLM knowledge base which does not have recent updates. Today, we’re taking the next logical step: giving that brain eyes and ears to the live internet. We’re going to build a basic but functional AI Research Agent that can browse the web to answer your questions — and we’ll do it entirely for free, right on your local machine.

This project proves that many of the powerful features you see in top-tier AI products can be replicated at home, giving you ultimate control, customizability and privacy.

Why Bother? The Case for Real-Time Knowledge

The single biggest limitation of most standard Large Language Models (LLMs) is their static nature. They are trained on a massive but finite snapshot of the internet. An AI agent, however, is different. It’s an LLM-powered system that can take actions using tools.

Our agent will operate on a simple but powerful loop:

  1. Receive a question: “What are the latest developments in AI-powered drug discovery?”
  2. Formulate a plan: “I need to search for recent news and research papers on this topic.”
  3. Use a tool: Execute a web search.
  4. Observe the result: Get a list of URLs.
  5. Use another tool: Scrape the content from the most promising URL.
  6. Synthesize and repeat: Read the content, decide if more information is needed, and either scrape another source or finish the task.

This turns your LLM from a static encyclopedia into a dynamic researcher.

Researchgen AI agent process

The Architectural Blueprint

We’ll build our agent using a stack of incredible, free, open-source libraries.

  • The Brain (Local LLM): We’ll use Ollama to run a powerful open-source model like gemma3:27b locally. Ollama makes running state-of-the-art models on your own hardware incredibly simple.
  • The Conductor (Agent Framework): LangChain provides the core logic. We’ll use its create_react_agent function, which implements the "ReAct" (Reason + Act) framework, allowing the model to think through its steps.
  • The Eyes (Search Tool): The duckduckgo-search library gives us a simple, no-API-key-required tool to perform web searches.
  • The Hands (Scraping Tool): We’ll use requests and BeautifulSoup4 to fetch and parse the text content from web pages.
  • The Scribe (Report Generator): Finally, fpdf2 will take our agent's findings and automatically generate a clean PDF report.
The Architectural Blueprint

The Code: Crafting the Agent’s Mind

While the full code is available for you to run, let’s focus on the two most critical parts: the tools and the agent’s “mind” — the prompt that tells it how to behave.

1. Defining the Agent’s Tools

First, we need to give our agent its capabilities. We create two primary tools: Search and Scrape.

def ddgs_search_tool(query: str) -> str:
"""Performs a DuckDuckGo search and returns a list of URLs."""
print(f"[Custom Search]: Searching for '{query}'...")
# ... (logic using the DDGS library) ...
return "\n".join(unique_urls)
def _fetch_web_content_for_tool(url: str) -> str:
"""Fetches the main text from a URL."""
print(f"\n[Tool Action]: Fetching content from URL: {url}")
# ... (logic using Requests and BeautifulSoup) ...
return text
# We wrap these functions in LangChain's 'Tool' class
tools = [
Tool(
name="Search",
func=ddgs_search_tool,
description="Use this to find URLs on the internet..."
),
Tool(
name="Scrape",
func=_fetch_web_content_for_tool,
description="Use this to fetch the text content of a URL..."
)
]# A simplified view of our tool definitions

These functions are the agent’s hands. It doesn’t know how they work, only what they do based on the description.

2. The Agent’s Core Prompt

This is the most crucial part. We’re not just asking the LLM a question; we are giving it a persona, a goal, a set of rules, and a memory. The prompt is the agent’s operating system.

**Research State:**
- Question: {input}
- Sources Found: {sources_found_count} out of {min_sources_required}
- Visited URLs: {visited_urls}
**Your Task:**
1. Start with a 'Search' to find relevant articles.
2. Review search results and use 'Scrape' on the most promising URL.
3. **Critically analyze the result of the Scrape tool:**
- If the content is good, you've found a source.
- If you get an error or the content is unsuitable, **immediately discard that URL**.
4. If your initial search yields no good URLs, **formulate a new, different search query**.
5. Once you have gathered {min_sources_required} high-quality sources, your final output must be `FINISH`.
**Tools Available:**
{tools}
**Previous Steps (Log):**
{agent_scratchpad}
**Your Next Step:**
Thought: Your reasoning for the next action.
Action: The tool to use...
Action Input: The input for the selected tool.
"""

By providing state variables like sources_found_count and visited_urls directly in the prompt, we give the agent memory. It learns from its mistakes (like trying a broken link) and knows how close it is to completing its goal.

3. The Execution Loop

Finally, a custom Python loop runs the agent, manages its state (like the list of visited URLs and accumulated data), and executes the tools the agent decides to use. When the agent has gathered enough sources, the loop stops, synthesizes the findings, and generates the final PDF report.

Conclusion: Your Personal AI Powerhouse

We’ve successfully built a system that empowers a local LLM to perform real-time research, breaking it free from its static knowledge base. You now have a blueprint for an agent that can:

✅ Answer questions with up-to-the-minute information.

✅ Synthesize data from multiple online sources.

✅ Generate a formatted report of its findings.

✅ Run entirely on your own machine, for free.

This project is just the beginning. You could expand it with more advanced tools, more sophisticated state management, or even have multiple agents collaborate. You’ve taken a significant step from being a mere user of AI to becoming a builder of bespoke AI solutions.

Researchgen, your local AI agent, streamlines information gathering and report creation. It autonomously navigates research steps and produces fully customizable reports tailored to your exact specifications, all without leaving your local environment.

Please find the complete code components at my Github repository:

https://github.com/Taha-azizi/researchgen

All images were created by the author using AI image creation tools.

Disclaimers

A Note on Performance

This is a demonstration of what’s possible with free, local tools. The performance of a model like gemma3:27b is impressive for its size, but it will not match the speed or reasoning capabilities of massive, proprietary models like GPT-4-Turbo. The agent might occasionally get stuck in a loop or choose a less-than-optimal tool. This project is a trade-off: you exchange peak performance for 100% privacy, zero cost, and infinite customizability.

Legal and Ethical Notice

The scraping tool in this project is for educational use only. Automated scraping may violate a website’s terms of service and place unnecessary load on its servers. Users must check robots.txt and ensure their use complies with applicable laws and regulations. The project maintainers are not responsible for any misuse or legal consequences arising from the use of this tool.

From an ethical standpoint, users should respect content ownership, avoid scraping paywalled or sensitive information, and refrain from using scraped data for misleading or harmful purposes. Responsible use is expected.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.