Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?
Artificial Intelligence   Latest   Machine Learning

TAI #138: OpenAI’s o3-Mini and Deep Research: A New Era of Reasoning Powered Agents?

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

What happened this week in AI by Louie

We realize that we have been alternating between OpenAI and DeepSeek-focused discussions recently, but this is with good reason, given some very impressive models and product releases. OpenAI’s o3-mini and Deep Research are powerful new products again, but they were not alone in this week’s releases. The open-source space also saw notable models, with Qwen and Mistral unveiling high-performance models focused on context length capabilities and efficiency, respectively. We were also impressed with A12’s Tülu 3 405B reasoning model adapted via Reinforcement Learning with Verifiable Rewards. Beyond its model releases, OpenAI also made headlines for its funding ambitions, with reports indicating early discussions to raise $40 billion at a $300 billion valuation led by SoftBank.

o3-mini replaces its predecessor, o1-mini, as the default reasoning model in ChatGPT and the API, delivering faster response times and enhanced accuracy in STEM tasks such as coding, mathematics, and science. With the addition of features like function calling, structured outputs, and developer messages, this release addresses common LLM developer requests while offering flexible reasoning effort settings — low, medium, and high — to balance speed and precision according to task complexity. It demonstrates clear improvements as this reasoning effort increases. For the AIME competition math performance at low effort, it achieves 60.0% accuracy, performing similarly to o1-mini (63.6%). With medium effort, it reaches 79.6%, closing in on o1 (83.3% even with 64 samples). At high effort, o3-mini surpasses both o1 and o1-mini, achieving 87.3% accuracy, the highest in the comparison. OpenAI’s o3-mini model also offers a significant cost advantage over its predecessors, o1 and o1-mini. Specifically, o3-mini is priced at $1.10 per million input tokens and $4.40 per million output tokens, making it 63% less expensive than o1-mini and 93% less expensive than the full o1 model. However, the model is still more expensive than Deepseek r1 ($0.55 input, $2.19 output).

Away from the API, ChatGPT Plus, Team, and Pro users will now benefit from increased message limits vs. o1-mini. Plus and Team users enjoy a cap of 150 o3-mini messages per day — a threefold increase over the previous 50 messages — while Pro users continue to have unlimited access for a $200 monthly fee. Additionally, free plan users are now able to experience o3-mini by selecting “Reason” in the message composer or regenerating a response, marking the first time a reasoning model is available beyond paid tiers.

In parallel, OpenAI unveiled Deep Research, an agent designed to autonomously conduct multi-step analyses by synthesizing information from a multitude of online sources. This tool is built upon a version of the upcoming o3 model that has undergone further reinforcement learning, specifically on research tasks. This marks the first instance in which the full potential of the o3 model is available to the public. Professionals in fields such as finance, science, and engineering can now leverage Deep Research to generate comprehensive reports that mirror the work of a capable research analyst (albeit still with potential hallucinations and the risk of finding unreliable sources on the web!). The system is capable of employing web browsing, Python tool integrations, and multi-step reasoning to produce outputs that are fully documented with clear citations and a transparent outline of its analytic process. Deep Research is currently offered exclusively to Pro users, who are allocated up to 100 research tasks per month, each task taking anywhere from 5 to 30 minutes, depending on complexity. In the future, the Plus tier is planned to have 10 per month, and the free tier “a very small number”.

Why should you care?

o3-mini and Deep Research are significant new tools for developers and technical professionals and somewhat close OpenAI’s reasoning model cost gap to DeepSeek. o3-mini offers reasoning performance comparable to o1 while improving response speed, cost, and efficiency. Its adaptive reasoning effort scaling makes it an invaluable tool across a range of applications, from competitive coding to complex software problems. Looking ahead, we are excited about reinforcement fine-tuning options for o3-mini, which should allow LLM Developers to further enhance its technical performance and adaptability for more use cases in many fields. We think this is just the beginning of a new era of much more powerful LLM Agents powered by reasoning models.

Meanwhile, Deep Research redefines AI-driven web search, research, and analysis, extending reasoning capabilities to tasks that typically require extensive manual effort. While OpenAI was not the first to release a “Deep Research” product (we also found Gemini’s earlier Deep Research tool very useful) — it is by far the most powerful we have tested. This agent also marks the first public exposure to the full o3 model and is also a promising signal for the new LLM reinforcement learning paradigm extending to more domains and multi-step agents. In typical exuberant fashion, Sam Altman claimed, “My very approximate vibe is that it can do a single-digit percentage of all economically valuable tasks in the world, which is a wild milestone.” While we wouldn’t count on this just yet, we think this class of reasoning-powered agents is likely to progress LLM adoption and economic impact to the next level.

Louie Peters — Towards AI Co-founder and CEO

Hottest News

1. OpenAI Launches o3-Mini, Its Latest ‘Reasoning’ Model

OpenAI has launched the o3-mini, which was first previewed in December and is the newest in the company’s o family of reasoning models. OpenAI O3-mini is fine-tuned for STEM problems, specifically programming, math, and science. OpenAI claims the model is largely on par with the o1 family, o1 and o1-mini, in terms of capabilities but runs faster and costs less.

2. DeepSeek-AI Releases Janus-Pro 7B: An Open-Source Multimodal AI

DeepSeek has released Janus Pro, a refined version of the Janus framework, to overcome the limitations of earlier models. Deepseek claims that this new set of multimodal AI models can outperform OpenAI’s DALL-E 3. The model achieves remarkable results in diverse tasks by addressing critical challenges through architectural innovation, optimized training, and data enhancement.

3. OpenAI Unveils a New ChatGPT Agent for ‘Deep Research’

OpenAI has launched deep research, a new capability in ChatGPT that independently conducts multi-step research on the internet. Deep research is built for people who do intensive knowledge work in areas like finance, science, policy, and engineering and need thorough, precise, and reliable research. The feature will be available to Pro users, with a monthly limit of up to 100 queries.

4. Mistral AI Releases the Mistral-Small-24B-Instruct-2501

Mistral AI introduces Mistral Small 3, a latency-optimized 24B-parameter model released under the Apache 2.0 license. According to Lample, the model was trained on 8 trillion tokens, compared to 15 trillion for comparable models. It achieves 81% accuracy on standard benchmarks while processing 150 tokens per second.

5. DeepSeek’s AI Restricted by ‘Hundreds’ of Companies in Days

Companies and government agencies worldwide are moving to restrict their employees’ access to the tools recently released by the Chinese AI startup DeepSeek. “Hundreds” of companies, particularly those associated with governments, have worked to block access to DeepSeek due to concerns about potential data leaks to the Chinese government and what they view as weak privacy safeguards, Nadir Izrael, chief technology officer of the cyber firm Armis Inc., said. We note that given Deepseek openly released its model weights — it is also possible to use their models from 100% US-based LLM providers.

6. Qwen AI Releases Qwen2.5–7B-Instruct-1M and Qwen2.5–14B-Instruct-1M

Qwen AI has introduced two new models, Qwen2.5–7B-Instruct-1M, and Qwen2.5–14B-Instruct-1M, designed to support context lengths of up to 1 million tokens. These models also have an open-sourced inference framework optimized for handling long contexts. This advancement enables developers and researchers to work with larger datasets in a single pass, offering a practical solution for applications that demand extended context processing.

7. OpenAI in Talks With SoftBank For Funding at $300 Billion Valuation

OpenAI is in early discussions to raise $40 billion in new funding at a valuation of around $300 billion (+73% from October valuation), with SoftBank expected to lead the round by investing $15 billion to $25 billion. This follows SoftBank’s recent $1.5 billion purchase of OpenAI shares and its commitment of $19 billion to Stargate, a data center joint venture with OpenAI. The Japanese conglomerate has allocated $40 billion in total for both investments. OpenAI has previously raised around $20 billion in outside capital.

Five 5-minute reads/videos to keep you learning

1. Open-R1: A Fully Open Reproduction of DeepSeek-R1

Open-R1 is an initiative to reconstruct DeepSeek-R1’s data and training pipeline, validate its claims, and push the boundaries of open reasoning models. This blog post looks at the key ingredients behind DeepSeek-R1, which parts to replicate, and how to contribute to the Open-R1 project.

2. How To Become a Generative AI Engineer in 2025?

This guide walks you through the steps, skills, and strategies for becoming a generative AI engineer. It shares details such as which industries hire GenAI engineers, salary expectations, key skills, etc.

3. Achieving General Intelligence (AGI) and Super Intelligence (ASI): Pathways, Uncertainties, and Ethical Concerns

This extremely in-depth blog post dives into the research tracks of Artificial Super Intelligence (ASI), highlights the challenges, and debates the ethics of a world where machines might outsmart their makers.

4. OpenAI’s Deep Research vs DeepSeek R1

OpenAI’s Deep Research and DeepSeek R1 push the boundaries of AI-powered knowledge synthesis. This blog highlights their performance benchmarks, use cases, limitations, and accessibility.

5. Building End-to-End Machine Learning Projects: From Data to Deployment

This article walks you through building end-to-end machine learning projects, from finding the right problem to data collection, building the model, and deployment.

Repositories & Tools

  1. Unsloth makes finetuning LLMs faster and uses 70% less memory, with no degradation in accuracy.
  2. Oumi streamlines the lifecycle of foundation models — from data preparation and training to evaluation and deployment.
  3. Adaflow is the library to build & auto-optimize LLM applications.
  4. Unsupervised People’s Speech is a dataset with more than one million hours of audio with diverse speakers.

Top Papers of The Week

1. Janus-Pro: Unified Multimodal Understanding and Generation with Data and Model Scaling

This paper introduces Janus-Pro-7B, an open-source AI image generator. An advanced version of Janus, Janus-Pro incorporates an optimized training strategy, expanded training data, and scales to larger model sizes. It achieves significant advancements in multimodal understanding and text-to-image instruction-following capabilities.

2. Thoughts Are All Over the Place: On the Underthinking of o1-Like LLMs

Researchers have identified a phenomenon called underthinking in o1-like LLMs, where frequent switching between reasoning paths leads to decreased performance. By introducing a thought-switching penalty, they improve reasoning depth without fine-tuning. This approach enhances accuracy on challenging mathematical problems, offering a solution to reasoning inefficiencies in these models.

3. SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Supervised fine-tuning (SFT) and reinforcement learning (RL) offer distinct post-training benefits for foundation models. RL excels at generalizing to unseen variations in text and visuals, while SFT enhances output stability for effective RL training. This study utilizes environments like GeneralPoints and V-IRL to reveal RL’s superior visual recognition and generalization capacities in complex, multimodal tasks.

4. s1: Simple Test-Time Scaling

Test-time scaling in language modeling uses extra compute to enhance reasoning performance. Researchers developed a method called budget forcing, applying it to the Qwen2.5–32B-Instruct model. After training on a curated dataset, this model outperformed OpenAI’s o1 in math questions by up to 27%. The model, data, and code are freely available online.

5. GuardReasoner: Towards Reasoning-Based LLM Safeguards

This paper introduces GuardReasoner, a reasoning safeguard that enhances LLM safety. Utilizing the GuardReasonerTrain dataset, reasoning SFT, and hard sample DPO, this model achieves superior performance on 13 benchmarks. In the F1 score, GuardReasoner 8B surpasses GPT-4o+CoT by 5.74% and Llama Guard 3 8B by 20.84%.

6. Chain of Agents: Large Language Models Collaborating on Long-Context Tasks

This paper proposes Chain-of-Agents (CoA), a framework that uses multi-agent collaboration through natural language and mitigates long context focus issues by assigning each agent a short context. CoA consists of multiple worker agents who sequentially communicate to handle different segmented portions of the text, followed by a manager agent who synthesizes these contributions into a coherent final output.

Quick Links

1. AI2 releases Tülu 3 405B, the first open-weight model to successfully apply a fully open post-training recipe at a 405-billion-parameter scale. The model introduces a novel reinforcement learning approach known as Reinforcement Learning with Verifiable Rewards (RLVR), which ensures that rewards are based on verifiable outcomes rather than subjective feedback.

2. NVIDIA AI has released the Eagle 2 series vision-language model. Unlike most models, which only provide trained weights, Eagle 2 details data collection, filtering, augmentation, and selection processes. This initiative aims to equip the open-source community with the tools to develop competitive VLMs without relying on proprietary datasets.

Who’s Hiring in AI

AI Digital Human Development Intern — 2025 @NVIDIA (Hong Kong, Remote)

Machine Learning Engineer — Gen AI & LLM @Databricks (Remote)

Software Engineer Intern — AI validation @INTEL (Bangalore, India)

AI Product Manager @Concentrix (Bangalore, India)

ML Research Engineer @Accretive Technology Group (US/Remote)

Junior AI Engineer @Capco (Orlando, FL, USA)

Interested in sharing a job opportunity here? Contact [email protected].

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓