
Deep Research with OpenAI’s API key
Author(s): Daniel Kharitonov
Originally published on Towards AI.
TL;DR if you need Deep Research capabilities but don’t want to pay $199 per month to OpenAI for their Pro Plan, a pay-as-you-go (and possibly better) research assistant can be built using OpenAI API. The key here is to work around the limitations of deep research models, organize workflow into the improvement loop, and incorporate custom tools that support fact-checking. Wrapping all this in Python delivers highly refined deep research for about $2-$5 per report.
Sample code and a quickstart example are on GitHub.
Intro.
It’s no secret that Deep Research (DR) is taking the world of web scraping and analysis by storm. Deep research models can plan their research steps, scrape hundreds of sites, and take self-corrective actions— which markedly improves their performance over regular reasoning LLMs with web access.
Unfortunately, these impressive abilities aren’t free.
For example, as of July 2025, ChatGPT limits Plus, Team, Enterprise, and Edu users to just 25 deep research queries per month, while Pro users get 125 (plus 125 extra “lightweight” queries). Free plan users don’t have any way to request deep research, although OpenAI states that five “lightweight” queries are still available to them.
These limits are well understood considering that deep research burns through tokens at warp speed — it goes through content of many websites, and repeatedly re-evaluates its own reasoning steps. In addition, the number of LLM providers offering Deep Research capabilities remains limited, so frontier labs can still charge premiums.
But is there a way to avoid the inflexible subscription model and pay as you go? The answer is “yes”, but this requires a bit of scripting.
Part 1. The basics.
Luckily, Deep Research models from OpenAI are also available via API, which makes it remarkably easy to access DR in a pay-as-you-go format outside the ChatGPT subscription tiers. As of July 2025, there are two DR models available — o3-deep-research, and o4-mini-deep-research. More models should be coming soon.
If you have a valid OpenAI API key, the simplest way to do deep research is a single call to Responses API:
system_message = """
You are a professional researcher preparing a structured, data-driven report on behalf of a global health economics team. Your task is to analyze the health question the user poses
Be analytical, avoid generalities, and ensure that each section supports data-backed reasoning that could inform healthcare policy or financial modeling.
"""
user_query = "Research impact of semaglutide on global healthcare systems."
response = client.responses.create(
model="o3-deep-research",
input=[
{
"role": "developer",
"content": [
{
"type": "input_text",
"text": system_message,
}
]
},
{
"role": "user",
"content": [
{
"type": "input_text",
"text": user_query,
}
]
}
],
reasoning={
"summary": "auto"
},
tools=[
{
"type": "web_search_preview"
},
{
"type": "code_interpreter",
"container": {
"type": "auto",
"file_ids": []
}
}
]
)
Source: OpenAI Introduction to Deep Research API
There are several things to note about the snippet above.
First, the Deep Research model must have a web search configured, and won’t work without it. The other tool suggested by OpenAI is a sandboxed code interpreter, although in my experience it’s rarely (if ever) used by the model, possibly because statistical computations aren’t often required as part of the deep research. An option to use an MCP server is also available, but remains restricted to just two functions ‘search’ and ‘fetch’ — which does not match the tool signatures of most public MCP servers.
Also, if you run the example above, your experience will be fairly anti-climactic. You will wait for two to ten minutes depending on the query, and then receive a few screens of text. Definitely not recommended as an example of good user experience.
Part 2. Agents, anyone?
One of the recent trends is to wrap every AI feature into an “agentic” app to make things better and more responsive.
OpenAI is certainly not a stranger to this approach, and a slightly more polished version of the code above can be found in their cookbook Introduction to Deep Research API agents.
The key improvements in their agentic version of Deep Research app are an async event processor to print intermediate research results, and the initial prompt clarification steps:

However, while an async method of running the Deep Research API and streaming responses back to the user is definitely a step forward, the way OpenAI employs their Agents framework in Deep Research is not particularly insightful. Inspecting the user query and rewriting the prompt is a standard process in a ChatGPT interface that routinely front-runs user queries with cheaper models, but brings limited value to realistic deep research queries which tend to be self-contained and not require much prompt engineering.
In other words, if we want to build an agentic workflow for Deep Research, we should iterate on research deliverables, and not on the initial prompt.
Part 3. What do we really need from a Deep Research Agent?
So let us start where OpenAI left off and design a research agent that could be truly useful.
Here is a list of features that I personally find compelling:
- Automatically improve research results. We know that complex queries may turn up conflicting information, so we may need to dig more.
- Provide real-time updates. This means designing a good async event processor to keep users informed of the steps the model takes.
- Use relevant tools. Web searches are great, but there might be better ways to query and verify your research.
- Work in stages. If multiple processing steps are required, we want to be able to run them independently to debug and avoid burning credits on failing pipelines.
- Keep track of expenses. We want statistics on how much we are spending on our deep research queries.
With that wishlist on hand, we can go ahead and start building.
Part 4. Code notes.
Architecture.
Let us use the same OpenAI Agents framework as in the cookbook, but modify the workflow. As discussed, we can safely throw out the prompt engineering stages, and focus on automatic iterations over results:

In the diagram above, the Research Agent runs the deep research model, and this initial report is challenged by the Critique Agent. If the former deems the research to be insufficient, another run can be requested.
The Report Agent consolidates the research and critique, and provides a final deliverable. The system is limited to a maximum of two deep research runs to conserve credits, and also to avoid the game of diminishing returns in case the research topic does not benefit from extra lookups.
Working with deep research model limits.
As mentioned before, the deep research models from OpenAI at present are limited in tooling. This has two implications for our Research Agent:
(1) We cannot use handoffs as the exclusive way of transferring control. This is because handoffs in Agents SDK are implemented as ‘transfer_to…’ tool calls, and deep research models will not support them. The workaround is to pass the control from Research Agent to Critique Agent programmatically (without LLM making a decision). It is okay to pass the control through the handoff in the opposite direction because Critique Agent can use a regular reasoning mode. Thus our control flow becomes hybrid.
(2) For the same reason, we cannot use MCP and custom tools in the Research Agent and again need to shift these functions to Critique Agent (which uses a regular reasoning model). The latter is also responsible for re-invoking the handoff to Research Agent if it finds the research results lacking.
Event processing.
We need to implement a parser for raw events to intercept web searches, reasoning summaries, MCP lookups, and tool calls. In addition, we need to collect token usage to provide the final cost estimates.
Pipelining.
Transitions between agents offer a natural spot to capture outputs, and start the next stage separately (if so desired). This helps greatly with prompt engineering and failure management, where we execute critique or reporting stages without re-running queries on Research Agent.
Part 5. Implementation.
Here we show the some highlights of the actual app implementation. Our agent factory returns three types of agents: Research, Critique and Report:
return Agent(
name="ResearchAgent",
instructions=research_instructions,
model=MODEL_RESEARCH,
model_settings=ResearchAgents._create_base_model_settings(),
tools=[
WebSearchTool(),
CodeInterpreterTool(tool_config=CodeInterpreter(
type="code_interpreter",
container={"type": "auto", "file_ids": []}
)),
],
handoffs=[]
)
...
return Agent(
name="CritiqueAgent",
instructions=critique_instructions,
model=MODEL_CRITIQUE,
model_settings=ResearchAgents._create_base_model_settings(),
tools=[
WebSearchTool(),
verify_url
],
handoffs=handoffs
)
...
return Agent(
name="FinalReportAgent",
instructions=final_report_instructions,
model=MODEL_FINAL_REPORT,
model_settings=ResearchAgents._create_base_model_settings(),
tools=[
CodeInterpreterTool(tool_config=CodeInterpreter(
type="code_interpreter",
container={"type": "auto", "file_ids": []}
)),
WebSearchTool()
]
)
As we can see, only the Critique Agent uses a normal handoff (can make a decision to return control back to Research Agent, or to the Report Agent).
We also equip the Critique Agent with a custom tool and an MCP server:
async def create_critique_agent_with_mcp(research_agent=None) -> tuple[Agent, MCPServerSse]:
"""Create critique agent with connected MCP server. Returns (agent, mcp_server) tuple."""
# Create and connect DeepWiki MCP server with robust timeout settings
deepwiki_server = MCPServerSse(
params={
"url": "https://mcp.deepwiki.com/sse",
"timeout": 30, # Connection timeout: 30 seconds
"sse_read_timeout": 600, # SSE read timeout: 10 minutes
},
client_session_timeout_seconds=60.0, # ClientSession read timeout: 60 seconds
cache_tools_list=True,
name="DeepWiki"
)
# Connect the server
await deepwiki_server.connect()
# Create base agent
agent = ResearchAgents.create_critique_agent(research_agent)
# Add MCP server to agent with strict schema configuration and timeout settings
agent.mcp_servers = [deepwiki_server]
agent.mcp_config = {
"convert_schemas_to_strict": True,
"timeout": 30, # Tool call timeout in seconds
"request_timeout": 60 # Request timeout in seconds
}
return agent, deepwiki_server
...
def verify_url(url: str) -> Dict[str, Any]:
"""
Verify if a URL or HTTP/HTTPS API endpoint exists and is accessible.
Args:
url: The URL to verify (must be properly formatted with http:// or https://)
Returns:
Dictionary containing verification results with status code, success flag, and details
"""
if not url or not isinstance(url, str):
return {
"success": False,
"status_code": None,
"error": "Invalid URL provided - must be a non-empty string",
"accessible": False,
"response_time_ms": None
}
These tools represent the workloads I care about, and should probably change in your applications. For my use cases, I employ DeepWiki to answer question on github repositories (exposed via the public DeepWiki MCP server). Additionally, I want to be able to ascertain that API endpoints and websites referenced by deep research are functional, so I am using a verify_url() function as a function which can return http codes from URLs in question.
It may also help to outline to explain how we can estimate usage costs in the app. We can automate this function in the Report Agent by instructing it to get prices on the OpenAI’s webpage, and then use the code interpreter to multiply them with token stats we collect in event processor:
"""
IMPORTANT: The critique analysis includes detailed token usage statistics at the end showing multiple models (e.g., o4-mini-deep-research, o4-mini) with their respective token counts. You MUST:
1. Extract ALL model names and their token usage from the token usage statistics
2. Use web search to get pricing from https://platform.openai.com/docs/pricing for each model
3. Use code interpreter to calculate costs for ALL models found in the statistics
4. Create a comprehensive cost breakdown table showing each model, token types, and costs
5. Include a "Performance Metrics and Cost Analysis" section with detailed cost calculations
"""
It is worth noting, however, that this cost estimation method is only approximate and should not be used in production. One shortcoming here is skipping the Report Agent itself in cost calculations done by the Report Agent, and another reason is volatility in price sources the model can find. In my production app, I run cost estimates in a standalone pipeline.
Part 6. Configuration and OpenAI failure modes.
Configuration for our Deep Research app mostly consists of adding a custom toolkit (or an MCP server) to verify the research results that matter in your use cases.
In addition, the relevant parts of the Critique Agent prompt in research_agents.py must be modified to reflect the custom tools available. This latter part is (in theory) optional because Agents can derive tool instructions from docstrings automatically; however, we found that explicit prompts still work better, especially when coercing the model to use MCP.
"""
IMPORTANT: You MUST use the verify_url tool to test every API endpoint mentioned in the research report. This is critical for validating source accessibility and API claims.
CRITICAL: For ANY GitHub repository mentioned in the research (even just the name like "facebook/react" or URLs like "https://github.com/user/repo"), you MUST immediately use the DeepWiki MCP tools to gather additional information:
1. First use ask_question to get basic information about the repository
2. Use ask_question again for specific technical details relevant to the research topic
3. If MCP tools timeout or fail, note this in your critique and continue with other verification methods
Tool Usage Guidelines:
- Use web search tool to verify claims when necessary
- Use DeepWiki MCP tools to ask in-depth questions about GitHub repositories mentioned in the research report.
MANDATORY: If the research mentions ANY of these patterns, use MCP tools immediately:
- Repository names like "openai/openai-python", "facebook/react", "microsoft/typescript"
- GitHub URLs like "https://github.com/user/repo"
- Any reference to "GitHub repository" or "repo" or "source code"
Examples of DeepWiki MCP use:
- ask_question(repoName="openai/openai-python", question="What is this repository about?")
- ask_question(repoName="openai/openai-python", question="How does authentication work?")
- ask_question(repoName="vercel/next.js", question="What are the main API components?")
- ask_question(repoName="tensorflow/tensorflow", question="How to implement custom models?")
- Use verify_url tool to check select API endpoints to verify if they actually exist
Examples of when to use verify_url:
- Testing API endpoint: verify_url("https://api.openai.com/v1/models")
- Checking API documentation: verify_url("https://docs.example.com/api")
"""
The default choice of models is defined in the config.py:
MODEL_RESEARCH = "o4-mini-deep-research"
MODEL_CRITIQUE = "o3-pro"
MODEL_FINAL_REPORT = "o4-mini"
To shift the balance between the quality and cost of the final result, the research model can be changed to “o3-deep-research”, and the critique model can be flipped to “o3” — or a similar reasoning model. The final report is usually okay to be made with a non-reasoning model.
Success for our deep research queries primary depends on ability to complete OpenAI model requests. Depending on the tier of service for your API key, the OpenAI endpoint may be more or less busy at peak times. If you request is bumped off, you may see a message like this:
ERROR:openai.agents:Error streaming response: An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_8c8c09a17f9544755961ba9a624ee920 in your message.
❌ Error: Workflow execution failed: An error occurred while processing your request. You can retry your request, or contact us through our help center at help.openai.com if the error persists. Please include the request ID req_8c8c09a17f9544755961ba9a624ee920 in your message.
If this happens to you, do not panic and just run a query few a minutes later when the load decreases. If your app launches automatically, you can also check the return code to see which part did not finish:
python main.py -q "query" -cvri
case $? in
0) echo "Success!" ;;
1) echo "Invalid arguments" ;;
2) echo "Research agent failed" ;;
3) echo "Critique agent failed" ;;
4) echo "Final report agent failed" ;;
5) echo "General error" ;;
esac
Don’t forget that this app can be run in stages, so (for example) if the final report fails, you can still reuse the results from research and critique.
Part 7. Use examples.
Let us first show an example of how a full run of the app looks:
$ export QUERY="Compare agent handoffs in Goog ADK, LangChain, and OpenAI Agents"
$ python agentic_research.py -q "$(echo $QUERY)" -cvri
🔍 Agentic Research Tool
========================================
ℹ️ 🔄 Iterative Research-Critique Workflow (Hybrid)
--------------------
Using hybrid approach: programmatic research→critique, OpenAI critique→research
ℹ️ Starting iterative workflow: Compare and contrast agent handoff mechanisms in Google ADK, LangChain, and OpenAI agents
🔍 Beginning research...
🔍 Research streaming events:
🔄 Handoff to: ResearchAgent
💭 [REASONING] ✓
💭 [REASONING SUMMARY]
**Exploring agent handoff mechanisms**
The user's query is about comparing agent handoff mechanisms in Google ADK, LangChain, and OpenAI agents for a structured research report. I need to consider what "agent handoff" means, likely in the context of conversational AI. I'm unsure about "Google ADK," but it might stand for "Google Agent Development Kit." Perhaps this relates to "Conversational Actions" or "Dialogflow." The report will need to logically assess how tasks are handed off between different AI agents, including data and maybe some charts.
**Investigating agent handoff mechanisms**
LangChain is a Python framework that allows for chaining language model calls, and it seems to involve something related to "Agents in LangChain." When it comes to OpenAI agents, this might refer to GPT-4's built-in features. Now, I see the user's question about comparing agent handoff mechanisms, which likely involves how control is transferred between agents in these frameworks. I should look up "Google ADK agent handoff mechanism" for more information.
🔍 [Web search] Google ADK agent handoff mechanism
💭 [REASONING] ✓
💭 [REASONING SUMMARY]
**Exploring multi-agent systems**
...
...
As we see, in the first (research) phase, the app is primarily busy running through the long cycle of web searches followed by reasoning summaries. Depending on complexity, this may take from two to ten minutes to complete.
After that, we should see the critique phase kicking in:
...
...
💭 [REASONING SUMMARY]
**Organizing GitHub Repositories**
I’m thinking about breaking out specific names like "facebook/react" and "openai/openai-python" into a neat list of GitHub repositories. I'll include entries such as google/adk-python, google/adk-samples, langchain-ai/langchain, openai/openai-agents-python, and openai/openai-agents-js. I might mention "openai/agents-runbook" since it's relevant, even if it's not a framework. My final plan involves structuring sections and potentially comparing elements in a table for clarity. This way, I can enhance understanding and organization!
**Structuring GitHub Repositories**
I’m considering using a small table for presenting information, but I think it might get cluttered with citations. Instead, I could use a bullet list to outline details like mechanism types and invocation styles. For instance, I'll describe how ADK, LangGraph, and OpenAI Agents function.
Then, I’ll provide a bulleted list of GitHub repositories—including google/adk-python and openai/openai-agents-python. I initially thought about including hyperlinks, but plain text names seem more fitting based on the instructions. So, I’ll settle on a straightforward list format.
💭 Generated 19,200 reasoning tokens
🎯 Total tokens: 1,870,200 (1,848,060 input, 22,140 output, 1,343,972 cached)
Raw research events saved to results/raw_events_research.json
✅ Initial research completed
Results saved to results/research_results.txt and results/research_results.json
📝 Starting critique with handoff capability and MCP tools...
📝 Critique streaming events:
🔄 Handoff to: CritiqueAgent
💭 [REASONING] ✓
l] verify_url(https://langchain-ai.lang.chat/langgraph/how-tos/agent-handoffs/)
🔧 [Tool] verify_url(https://openai.github.io/openai-agents-js/guides/handoffs)
💭 [REASONING] ✓
🔧 [Tool] verify_url(https://github.com/google/adk-python)
💭 [REASONING] ✓
💭 [REASONING SUMMARY]
**Investigating link issues**
Wait, I noticed that the GitHub link for adk-python gives a 404 error, but our tool indicates a status of 200. That’s a bit odd! It suggests the link may have been renamed or moved, even though the tool shows it accessible. The response time seems high too, but there's no final URL. It's strange, but I guess we can trust the tool's initial success. This also seems to apply to adk-samples.
💭 [REASONING] ✓
📚 [MCP] ask_question(Serafin-dev/google-adk: 'What is this repository about?')
💭 [REASONING] ✓
📚 [MCP] ask_question(google/adk-python: 'What is this repository about?')
💭 [REASONING] ✓
📚 [MCP] ask_question(google/adk-python: 'What handoff mechanisms are supported by its SDK?')
💭 [REASONING] ✓
📚 [MCP] ask_question(google/adk-samples: 'What is this repository about?')
💭 [REASONING] ✓
In the critique phase shown above, we should see several tool and MCP calls in addition to web searches. This phase may also end up passing the criticism back to the Research Agent for more insights:
🔧 [Tool] transfer_to_researchagent({})
🔄 Handoff to: ResearchAgent
💭 [REASONING] ✓
💭 [REASONING SUMMARY]
**Reviewing research content**
It seems the user is looking for feedback on a research report, likely focused on evaluating accuracy and content. I need to review it closely for correctness and verify statements against sources like GitHub and Google Cloud. The content has many references, which raises questions about completeness and potential biases. I’ll pay special attention to the sections covering “agent handoff mechanisms” in various frameworks, breaking down the content and checking for factual accuracy and any gaps.
🔍 [Web search] Google ADK agent handoff transfer_to_agent LlmAgent sub_agents
💭 [REASONING] ✓ 💭 [REASONING] ✓ 💭 [REASONING] ✓ 🔍 [Web search] "transfer_to_agent"
💭 [REASONING] ✓ 🔍 [Web search] "FunctionCall(name='transfer_to_agent'"
💭 [REASONING] ✓ 🔍 [Web search] site:google.github.io ADK sub_agents transfer
💭 [REASONING] ✓ 💭 [REASONING] ✓ 💭 [REASONING] ✓ 💭 [REASONING] ✓
💭 [REASONING SUMMARY]
**Exploring Relevant Class Definitions**
I need to find something relevant to the user's request, and I’m leaning toward examining class definitions. It might help clarify or provide insights that align with what I'm looking for. I wonder if it could lead to a better understanding of the concepts involved. I'll take some time to delve into these definitions to see how they connect with the user's needs. Let's see where this takes me!
After more deliberation, the Research Agent should complete and yield straight to the Report Agent:
While the report covers main patterns, it could discuss nuances, such as resuming or chaining agents. Mentioning how different agent types interact would add depth. The source quality appears solid, but referencing unofficial GitHub links can introduce bias. A structured critique would enhance clarity and completeness.
💭 Generated 17,152 reasoning tokens
🎯 Total tokens: 1,647,572 (1,628,046 input, 19,526 output, 1,371,630 cached)
Raw research_critique_iterative events saved to results/raw_events_iterative.json
✅ Iterative workflow completed
============================================================
FINAL OUTPUT:
============================================================
**Factual Accuracy:** The report’s descriptions of each framework’s handoff mechanism are generally on point, but there are a few minor inaccuracies or confusing bits. For Google ADK, it correctly notes that one can emit a `transfer_to_agent` function call to let a child agent take over (this is confirmed by Google’s ADK blog, which explains that calling a sub-agent fully “transfers…responsibility” to that agent and leaves the root “out of the loop” ([cloud.google.com](https://cloud.google.com/blog/products/ai-machine-learning/build-multi-agentic-systems-using-google-adk#:~:text=This%20limitation%20stems%20from%20how,in%20these%20scenarios))). It also correctly describes wrapping an agent as an `AgentTool`, which the root can call like a function. The Google Cloud example even shows `flight_tool = agent_tool.AgentTool(agent=flight_agent)` and a root LLM agent invoking it ([cloud.google.com](https://cloud.google.com/blog/products/ai-machine-learning/build-multi-agentic-systems-using-google-adk#:~:text=from%20google.adk.agents%20import%20agent_tool%20,can%20use%20these%20tools)). However, the snippet in the report that reads ``FunctionCall(name='ImageGen',…)`` seems out of place (it likely should show calling the flight or booking agent’s tool instead). Also, the report relies on a user’s GitHub (`Serafin-dev/google-adk`) for details; it would be stronger to cite Google’s own ADK docs or code for verification.
Finally, the Report Agent compiles the final output and computes the resource costs:
Critique saved to results/critique_results.txt
============================================================
ℹ️ 📊 Final Report Mode
============================================================
ℹ️ Starting final report generation from previous workflow steps
📊 Generating comprehensive final report...
(Synthesizing research findings and critique into markdown format)
🔄 Handoff to: FinalReportAgent
💭 [REASONING] ✓ 🔍 [Web search] site:platform.openai.com/docs/pricing o4-mini-deep-research
💭 [REASONING] ✓ 💭 [REASONING] ✓ 💭 [REASONING] ✓ 🔍 [Web search] site:openai.com o4-mini pricing tokens
💭 [REASONING] ✓ 💭 [REASONING] ✓ 🔍 [Web search] o4-mini per million input tokens site:platform.openai.com
💭 [REASONING] ✓
[ (53 lines) code interprer] import pandas as pd ...
💭 [REASONING SUMMARY]
**Exploring pricing tiers**
## Performance Metrics and Cost Analysis
| Model | Token Type | Count | Rate (USD per 1M tokens) | Cost (USD) |
|--------------------------------|----------------------|-----------:|--------------------------:|---------------:|
| **o4-mini-deep-research** | Prompt (billable) | 484,888 | $2.00 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.9698 |
| | Reasoning | 19,200 | $2.00 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.0384 |
| | Cached (free) | 1,343,972 | $0.00 | $0.0000 |
| | Output | 22,140 | $8.00 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.1771 |
| | **Total** | 1,870,200 | | **$1.1853** |
| **o4-mini** | Prompt (billable) | 239,264 | $1.10 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.2632 |
| | Reasoning | 17,152 | $1.10 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.0189 |
| | Cached (free) | 1,371,630 | $0.00 | $0.0000 |
| | Output | 19,526 | $4.40 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.0859 |
| | **Total** | 1,647,572 | | **$0.3679** |
| **Subtotal (Research + Critique)** | | 3,517,772 | | **$1.5532** |
| **Estimated Final Report (o4-mini)**| Prompt | 2,000 | $1.10 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.0022 |
| | Output | 10,000 | $4.40 ([ai-stats.phaseo.app](https://ai-stats.phaseo.app/prices/openai?utm_source=chatgpt.com)) | $0.0440 |
| | **Stage Total** | 12,000 | | **$0.0462** |
| **Grand Total (All Stages)** | | 3,529,772 | | **$1.5994** |
At this point, the ./results folder should have the final report in Markdown, plus the intermediate critique and research outputs in text. In addition, the raw events are saved for each Agent in case models emit new event types that are not yet accounted for.
After the full run completes, the intermediate text results for the research and critique could also be used to restart later stages in case they have failed or need prompt adjustments.
For instance, the run below starts the critique stage where the research stage left off:
$ $ python agentic_research.py -q "$(echo $QUERY)" -cv
That’s it.
Happy deep researching!
This code is available on GitHub, so feel free to clone and star the repo!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI