TAI #136: DeepSeek-R1 Challenges OpenAI-o1 With ~30x Cheaper Open-Source Reasoning Model

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

What happened this week in AI by Louie

This week, the LLM race was blown wide open with Deepseek’s open-source release of R1. Performance is close to o1 in most benchmarks. Built on top of DeepSeek’s v3 model, R1 API output token prices are 30x less than o1. It’s available under the MIT license, supporting commercial use and modifications. Deepseek also disclosed many of its methods and experiments in its paper, in stark contrast to the secrecy surrounding reasoning techniques at AI labs in the U.S.

R1 wasn’t the only huge LLM release from China this week. Two new LLM competitors hit the ground running with very strong models. MiniMax-01, a 456bn parameter Mixture of Experts Model, challenges Google’s Gemini models for SoTA in long context capabilities. It offers 4 million input context due to its new Lightning Attention (hybrid) architecture. Kimi-1.5, on the other hand, is another new reasoning model that challenges o1 on multimodal capabilities.

Deepseek’s release included three different models/ model families:

DeepSeek-R1-Zero was an experiment that applied reinforcement learning (RL) directly to a base language model (V3) without any prior supervised fine-tuning. In essence, they attempted to teach the model to reason purely through trial and error, providing it with rewards for correct answers and well-formatted responses. This is somewhat analogous to how AlphaZero mastered games like Go and chess, learning solely through self-play and a reward signal based on winning or losing. The results were very impressive on many benchmarks; however, it fell short in some fields, and the model’s output was often messy and hard to read.

To address the limitations of R1-Zero and enhance its reasoning abilities further, the DeepSeek team introduced R1, which incorporated a “cold start” of human-like reasoning data before applying reinforcement learning. This involved creating a small dataset of examples demonstrating desired reasoning patterns and output formats. This was followed by a multi-stage process. First, reasoning-oriented RL was applied, focusing on tasks with clear solutions, like math and coding. Then, they generated a new batch of high-quality data samples for fine-tuning, created by filtering model outputs during the RL phase. Finally, they applied a final round of reinforcement learning, this time focusing on general helpfulness and harmlessness in addition to reasoning.

Across key benchmarks like AIME 2024, Codeforces, GPQA Diamond, and MATH-500, DeepSeek-R1 consistently performs on par with OpenAI’s o1 (79.8 vs. 79.2, 96.3 vs. 96.6, 71.5 vs. 75.7, and 97.3 vs. 96.4, respectively). They also got very similar performance on the SWE-bench Verified coding challenge (49.2 vs 48.9).

The final piece of DeepSeek’s work involved distilling the advanced reasoning capabilities of R1 into smaller, cheaper, dense models (Llama and Qwen series). Using the larger R1 model as a “teacher,” they fine-tuned several smaller models (ranging from 1.5B to 70B parameters) on the high-quality data curated from the R1 training process. The smaller distilled models significantly outperformed other models of similar sizes and even rivaled much larger models on reasoning benchmarks. DeepSeek-R1 outputs distilled into the tiny Qwen-1.5B even beat 4o on some math and code benchmarks!

Why should you care?

DeepSeek-R1’s release is significant for several reasons. First, its open-source nature and competitive performance at a fraction of the cost of o1 democratizes access to advanced reasoning capabilities. The API costs of DeepSeek-R1 per million tokens are currently $0.14 for cached inputs, $0.55 for non-cached inputs, and $2.19 for outputs. In contrast, the API costs for o1 are respectively $7.5, $15, and $60. About a x30 difference in costs! Moreover, the open model weights open up huge opportunities for adapting and fine-tuning these models for different domains and industries. The open release of its training methods also provides a blueprint for many others to follow. One surprise from the paper was that simpler techniques for enabling reasoning abilities worked better than some more complex options. We think there is a huge area for exploring and experimenting with these techniques now that scaled reinforcement learning for LLMs has been unlocked!

The huge success shown by distilling big reasoning models into much smaller non-reasoning models also suggests we will get another wave of rapid improvement and cost reduction across the LLM spectrum.

The fact a Chinese company is leading this charge also adds a geopolitical dimension, particularly given that Deepseek has managed to achieve this despite GPU export restrictions and a far smaller budget than Western AI labs.

— Louie Peters — Towards AI Co-founder and CEO

Introducing Our Brand New 8-hour Generative AI Primer Course

A programming language-agnostic 1-day LLM Bootcamp designed for developers.

95% of developers I meet are only scratching the surface of what LLMs can do. When working with LLMs, you are CONSTANTLY making decisions such as open-source vs. closed-source, how to fit LLMs into your use case, whether no-code solutions are good enough for your workflow, the extent to which consider the limitations of LLMs, and so on. And the biggest gap we see on top of all this is whether you are using LLMs to their full capacity, even with chat interfaces like ChatGPT or APIs for models like Gemini. The question is: are you?

This certification course is specifically designed to cut through the noise, help you ask the right questions, and show you exactly how to find answers. LLMs are moving so fast, with updates being released almost every day; what you need is an intuitive ‘framework,’ and just like LLMs, you need enough ‘context’ to know what developments are relevant to you and your use case so you can make the most out of this transformative technology.

In just 8 hours, through lessons, videos, exercises, quizzes, and hands-on projects, you’ll:

Dive deep into the ‘psyche’ of LLMs: how they work, how to make them work better, and how to train them for tasks you hate doing.
Work with leading AI models and integrate them into your workflows seamlessly.
Build your own no-code/low-code prototype that brings your ideas to life.

You’ll finish before you even realize it, and by tomorrow, you’ll already be AI-proofed. Secure your spot now!

Hottest News

1. OpenAI Released Scheduled Tasks in ChatGPT

OpenAI has introduced scheduled tasks in ChatGPT for Plus, Pro, and Team plans. These allow automated prompts and notifications on the Web, iOS, Android, and MacOS. Users can assign tasks like daily updates or reminders and receive notifications via push or email. Windows support will follow in Q1. Currently, a limit of 10 active tasks is enforced.

2. Chinese AI Company MiniMax Releases New Models

Chinese AI company MiniMax, an Alibaba- and Tencent-backed startup, debuted three new models. MiniMax-Text-01 is a text-only model, while MiniMax-VL-01 can understand images and text. T2A-01-HD, meanwhile, generates audio — specifically speech. MiniMax claims that MiniMax-Text-01 performs better than models such as Gemini 2.0 Flash and MiniMax-VL-01 rivals Claude 3.5 Sonnet.

3. Kimi Launches New SOTA Multimodal Model

Beijing Moonlit Dark Side Technology introduced the new Kimi k1.5 multimodal thinking model. Updates include long context extension, improved policy optimization, and multimodality. Its report shows their Sota short-CoT performance outperforms GPT-4o and Claude Sonnet 3.5 on AIME, MATH-500, and LiveCodeBench by a large margin.

4. Alibaba Slashes Prices on LLMs by Up to 85% As China’s AI Rivalry Heats Up

Alibaba Cloud announced an 85% price reduction on its Qwen-VL visual language model. The move demonstrates how competition among China’s technology giants to win more business for their nascent artificial intelligence products is intensifying.

5. Google Is Forming a New Team To Build AI That Can Simulate the Physical World

Google is forming a new team led by Tim Brooks under DeepMind to build AI models for simulating the physical world, collaborating with Gemini, Veo, and Genie teams on “world models.” These models aid in video generation, multimodal data, and interactive environments.

6. Mistral Signs Deal With AFP To Offer Up-to-Date Answers in Le Chat

Mistral has announced a content deal with newswire Agence France-Presse (AFP) to improve the accuracy of answers in Le Chat, Mistral’s chatbot. Le Chat will be able to tap into AFP’s stories — around 2,300 stories per day in six languages and query AFP’s entire archive dating back to 1983.

7. President Trump Repeals Biden’s AI Executive Order

President Donald Trump revoked a 2023 executive order signed by former President Joe Biden that sought to reduce the potential risks AI poses to consumers, workers, and national security. During his campaign, Trump promised policies to “support AI development rooted in free speech and human flourishing.”

Five 5-minute reads/videos to keep you learning

1. Retrieval-Augmented Generation (RAG) vs. Cache-Augmented Generation (CAG): A Deep Dive Into Faster, Smarter Knowledge Integration

Retrieval-augmented generation (RAG) and cache-augmented generation (CAG) are two methodologies for generating more context-aware responses from LLMs. This article provides an extensive, step-by-step guide on both approaches, dives into their workflows, compares their advantages and drawbacks, and offers an implementation guide for CAG.

2. Why AI Language Models Choke On Too Much Text

GPUs revolutionized AI by enabling massive parallel processing, leading to transformer models scaling rapidly. Despite advancements, transformers remain inefficient with long contexts due to quadratic compute costs. This article discusses why this happens and shares some approaches to solving this problem.

3. Simplifying Alignment: From RLHF To Direct Preference Optimization (DPO)

This article explores how Direct Preference Optimization (DPO) simplifies aligning large language models with human preferences over Reinforcement Learning with Human Feedback (RLHF). It breaks down the math and highlights why DPO might be the smarter, easier way forward.

4. Mastering Data Scaling: The Only Guide You’ll Ever Need (Straight From My Journey)

Data scaling is a crucial step in ensuring optimal model function. It prepares datasets for machine learning models. This article discusses why scaling is important, its types, and how and when to apply it.

5. Takes On “Alignment Faking in Large Language Models”

Researchers revealed that Claude 3 Opus fakes alignment with training objectives to avoid behavioral modification — a phenomenon labeled “alignment faking.” This author shares their take on the results.

Repositories & Tools

The micro diffusion repository demonstrates the training of large-scale diffusion models from scratch on a minimal budget.
LocalAI is a free, open-source alternative to OpenAI, Claude, and others.
Maxun lets you train a robot in 2 minutes and scrape the web on auto-pilot.
Agentless is an agentless approach to automatically solve software development problems.
CopilotKit provides React UI and infrastructure for AI Copilots, in-app AI agents, AI chatbots, and more.

Top Papers of The Week

1. LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs

LlamaV-o1 redefines step-by-step visual reasoning in large language models by introducing a benchmark with eight challenge categories and a metric for granular evaluation. The multimodal model, trained through multi-step curriculum learning, surpasses existing models like Llava-CoT by 3.8% in performance across six benchmarks and runs five times faster during inference.

2. KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

Researchers developed KaLM-Embedding, a multilingual embedding model using high-quality, diverse training data. Techniques like persona-based synthetic data, ranking consistency filtering, and semi-homogeneous task batch sampling enhance its performance. The model excels in multilingual embedding tasks, outperforming others of similar size on the MTEB benchmark.

3. Titans: Learning to Memorize at Test Time

This paper introduces a new family of architecture called Titans based on a new neural long-term memory module. The module learns to memorize historical context and helps attention to attend to the current context while utilizing long-past information. Experimental results show that Titans are more effective than Transformers and recent modern linear recurrent models.

4. Transformer 2: Self-adaptive LLMs

This paper introduces Transformer 2, a framework that adapts LLMs for unseen tasks in real-time by selectively adjusting only the singular components of their weight matrices. During inference, Transformer 2 employs a dispatch system to identify the task properties, and then task-specific “expert” vectors, trained using reinforcement learning, are dynamically mixed to obtain targeted behavior for the incoming prompt. It outperforms approaches such as LoRA with fewer parameters.

Quick Links

1. Six charts about AI revenue. OpenAI captures approximately 62.5% of consumer AI spending. xAI’s revenue jumped from $5M to $100M, while OpenAI soared from $200M to $5B. Sapphire Ventures reports 28 AI-native companies exceeding $25MM in ARR, predicting substantial growth for AI-native startups in the coming year.

2. DeepSeek-R1 achieves performance comparable to OpenAI’s o1 system across mathematics, coding, and general reasoning tasks, cementing its place as a leading competitor. DeepSeek has open-sourced DeepSeek-R1-Zero and DeepSeek-R1, along with six smaller distilled models.

Who’s Hiring in AI

Applied AI Engineer, Applied Science @Mistral AI (Paris, France)

Cambridge Internship in ML Model Optimization @Microsoft Corporation (Cambridge, United Kingdom)

Machine Learning Software Engineering Undergraduate Intern @INTEL (Santa Clara, CA, USA)

Tech Consulting AI LLM Developer Manager @Accenture (Multiple Locations)

Full-Stack Developer (React + Python + Azure) @Solvd (Remote)

AI/ML Supervisor @Ford Motor Company (Dearborn, MI, USA)

GenAI/Machine Learning Technical Project Manager @Deloitte (Multiple US Locations)

Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

TAI #136: DeepSeek-R1 Challenges OpenAI-o1 With ~30x Cheaper Open-Source Reasoning Model

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Introducing Our Brand New 8-hour Generative AI Primer Course

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

LAI #66: Information Theory for People in a Hurry

🔎 Decoding LLM Pipeline — Step 1: Input Processing & Tokenization

Meta to Launch Its Own In-House AI Chip

I Built an AI Money Coach in Python — Here’s How You Can Too (Step-by-Step Guide!)

ChatGPT Now Works Natively in Xcode and VS Code

The World’s Leading AI and Technology Publication.

Company

CONTACT US

🔥 Recommended Articles 🔥

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

TAI #136: DeepSeek-R1 Challenges OpenAI-o1 With ~30x Cheaper Open-Source Reasoning Model

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Introducing Our Brand New 8-hour Generative AI Primer Course

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement

Subscribe to our AI newsletter!

🔥 Recommended Articles 🔥