TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

What happened this week in AI by Louie

This week, AI models continued to push the frontiers of capability, with both OpenAI and DeepMind achieving gold-medal-level results at the 2025 ICPC World Finals coding contest. The scale of capital investment and ambition was also clear, with Nvidia announcing a letter of intent to invest up to $100 billion in OpenAI, alongside a 10 GW GPU purchase agreement. Yet, at the same time as these limits were being pushed, two landmark studies from OpenAI/NBER and Anthropic gave a detailed, data-driven look at how AI is actually being used by hundreds of millions of people today.

In a demonstration of algorithmic reasoning, both OpenAI and Google’s Gemini Deep Think models delivered performances equivalent to a gold medal at the ICPC, the “coding Olympics.” OpenAI’s system solved all 12 complex problems within the five-hour limit, outperforming every human team, while Google’s entry solved 10. These results, achieved under the same constraints as human competitors, show the maturation of AI in complex, multi-step logical tasks that were until recently the exclusive domain of elite human experts.

The industry’s ambition was further underscored by OpenAI’s new 10GW GPU purchase agreement with Nvidia. The scale of this deal is significant: 10 GW is equivalent to the entire U.S. data center fleet’s consumption in 2018 and is enough to power roughly 8 million homes. This aligns with an infrastructure footprint of 4–5 million next-generation GPUs, representing $200–300 billion in hardware costs and a total capital expenditure of around $500 billion when factoring in memory, power, cooling, other infrastructure, and facilities.

While the frontier pushes toward superintelligence-scale compute, the new usage studies provide a crucial reality check. The OpenAI/NBER paper, covering 700 million weekly ChatGPT users sending 2.5 billion messages daily, found a dramatic shift toward personal applications. Non-work-related messages have surged from 53% to 73% of all traffic in the past year. The most common use cases are not coding or complex analysis, but “Practical Guidance” at 28%, “Seeking Information” at 21% and “Writing” at 28% of all conversations. Coding represents a surprisingly small 4.2% of consumer usage, with Anthropic models and API usage still more popular for coding.

Source: Nber and OpenAI. Breakdown of granular conversation topics calculated from a sample of approximately 1.1 million sampled conversations from May 15, 2024, through June 26, 2025.

Anthropic’s Economic Index, which tracks Claude usage, paints a complementary but distinct picture. It finds that API customers — primarily businesses and developers — focus heavily on computer and mathematical tasks (44% of traffic). These enterprise users also lean heavily into automation, with 77% of API conversations being directive, a stark contrast to consumer chat, where the split between automation and collaborative augmentation is nearly even. While directive automation is rising on consumer chat (from 27% to 39% in nine months), higher-use countries paradoxically tend to be more collaborative, suggesting mature users find more value in advisory patterns over simple one-shot completions.

Together, the studies reveal a bifurcation in how AI is being used. For consumers, it is increasingly an “Advisor,” a tool for decision support. In fact, “Asking” for information or advice now constitutes 52% of ChatGPT use and receives the highest user satisfaction ratings. For enterprise and API users, AI is more of an “Agent,” a tool for task automation. Writing is the common thread, but the nature of the task differs. On ChatGPT, writing is the top work-related activity (40%), with two-thirds of these requests involving editing or summarizing user-provided text, rather than generating it from scratch. Across all work-related use, about 81% of messages are associated with two broad work activities: 1) obtaining, documenting, and interpreting information; and 2) making decisions, giving advice, solving problems, and thinking creatively.

Why should you care?

The current AI moment is defined by a massive disconnect. On one side, you have a market fueled by ~ $10 trillion in AI market capitalization and $500 billion in annual AI data center capital investment. On the other hand, you have a user base where, outside of coding, real productivity gains are driven by a small minority of power users. Is this a bubble, or is there enough real value being created to justify the investment? As a quick rule of thumb, if you don’t have a paid AI plan or spend over $30 a month on API calls, you are nowhere near getting the most out of these models, and that describes the vast majority of today’s 800 million weekly users.

The bet from big tech CFOs is that the rest of the world will catch up. The bull case is easy to see: if 5.5 billion internet users each gain an average of just $1,000 per year in value from AI, the economic justification is easily there. OpenAI’s $200 billion 2030 revenue forecast starts to look plausible. But this outcome is far from certain. The entire structure could come crashing down if many more professionals are not soon persuaded to start using these models effectively in their work.

This transition hinges on two things. First, people need to be taught how to use these tools properly, building an intuition for where they add value beyond simple queries. Second, companies need to improve significantly in building custom “AI for X” workflows and agents. Most enterprise AI developments still fail due to foundational errors in team structure, model selection, and system design.

The immediate opportunity lies in bridging this competency gap. The companies and individuals who can translate the raw potential of an AI “Advisor” and “Agent” into reliable, integrated workflows will capture the immense value that is currently being left on the table.

— Louie Peters — Towards AI Co-founder and CEO

Hottest News

1. Gemini 2.5 Deep Think Achieved Gold-Medal–Level Performance at ICPC World Finals

Gemini 2.5 Deep Think reached gold-medal–level performance at the 2025 International Collegiate Programming Contest (ICPC) World Finals, the most prestigious university-level algorithmic competition. Gemini solved eight problems in the first 45 minutes and two more within three hours, using advanced data structures and algorithms. With 10 problems solved in 677 minutes, Gemini would have placed second overall compared to the competing university teams.

2. xAI Launches Grok 4 Fast: Cost-Efficient Multimodal Reasoning Model

xAI introduced Grok-4-Fast, a cost-optimized successor to Grok-4 that merges “reasoning” and “non-reasoning” behaviors into a single set of weights controllable via system prompts. It has a 2M context window model, excelling in reasoning and coding, and scored 60 on the Artificial Analysis Intelligence Index. It outperforms its larger siblings on LiveCodeBench while being 25 times cheaper than competitors like Gemini 2.5 Pro at $ 0.2 per million input tokens.

3. Alibaba Qwen Team Just Released FP8 Builds of Qwen3-Next-80B-A3B

Alibaba’s Qwen team has just released FP8-quantized checkpoints for its new Qwen3-Next-80B-A3B models in two post-training variants, Instruct and Thinking, aimed at high-throughput inference with ultra-long context and MoE efficiency. The FP8 Instruct version reproduces Qwen’s BF16 benchmark results, matching the Qwen3–235B-A22B-Instruct-2507 on knowledge, reasoning, and coding tasks, and outperforming it on long-context workloads of up to 256k tokens.

4. Alibaba Releases Tongyi DeepResearch: A 30B-Parameter Open-Source Agentic LLM

Alibaba’s Tongyi Lab has released Tongyi-DeepResearch-30B-A3B, an open-source agentic LLM specialized for long-horizon, tool-augmented information-seeking. The model employs a mixture-of-experts (MoE) design with ~30.5 billion total parameters and ~3–3.3 billion active parameters per token, enabling high throughput while preserving strong reasoning performance. Techniques such as the IterResearch restructure context each “round,” retaining only essential artifacts to mitigate context bloat and error propagation, while the ReAct baseline demonstrates that the behaviors are learned rather than prompt-engineered.

5. Detecting and Reducing Scheming in AI Models

OpenAI shared new research addressing “scheming,” where models act one way on the surface while pursuing hidden goals. The paper compares this to a stockbroker breaking the law to maximize profit. Researchers concluded that most observed failures involve simple deception, such as pretending to complete tasks. While generally low-stakes, the work outlines methods to better detect and mitigate deceptive patterns in AI systems.

6. IBM Released Granite Docling

IBM has released Granite-Docling-258M, an open-source (Apache-2.0) vision-language model designed specifically for end-to-end document conversion. The model targets layout-faithful extraction — tables, code, equations, lists, captions, and reading order — emitting a structured, machine-readable representation rather than lossy Markdown. IBM replaced the earlier backbone with a Granite 165M language model and upgraded the vision encoder to SigLIP2 (base, patch16–512) while retaining the Idefics3-style connector (pixel-shuffle projector). The resulting model has 258M parameters and shows consistent accuracy gains across layout analysis, full-page OCR, code, equations, and tables.

Five 5-minute reads/videos to keep you learning

1. Qwen2.5-VL: A Hands-On Code Walkthrough

This technical guide walks through the Qwen2.5-VL multimodal model, showcasing improvements such as a window attention mechanism in its Vision Transformer (ViT) and dynamic video frame sampling. The architecture includes three core components: a ‘process_vision_info’ module for preprocessing, a ViT encoder for feature extraction, and a Qwen2.5 LM Decoder with 3D M-rope for joint visual–text processing. A step-by-step code example covers model loading, data handling, prompt construction, and inference, making it a practical resource for developers.

2. Anthropic Economic Index Report: Uneven Geographic and Enterprise AI Adoption

Anthropic expanded its Economic Index with new dimensions: geographic trends in Claude.ai usage and enterprise-level API adoption. The report highlights how Claude usage has evolved over time, how adoption varies by region, and how enterprises are applying frontier AI systems to real-world business challenges.

3. Review of Multimodal Technologies: ViT Series (ViT, Pix2Struct, FlexiViT, NaViT)

This review traces the evolution of Vision Transformer (ViT) models beyond fixed image resolutions and patch sizes. It covers Pix2Struct, which preserves original aspect ratios; FlexiViT, which adapts to varying patch sizes; and NaViT, which applies a “Patch n’ Pack” technique for native-resolution processing. Together, these innovations broaden the applicability and efficiency of ViTs for diverse visual understanding tasks.

4. Evolution of Transformers Pt2: Sequence Modelling (Transformers)

This article explains why the Transformer architecture excels at sequence modeling, emphasizing the role of self-attention in weighing token relevance. Early layers capture syntactic relationships, while deeper layers capture semantic context. Key elements — encoder-decoder design, positional encoding, and parallel training — help overcome the long-range dependency issues of RNNs. The piece also notes challenges such as sequential inference and error accumulation.

5. Measuring Uplift Without Randomised Control — a Quick and Practical Guide

For cases where randomized controlled trials aren’t feasible, this guide outlines practical alternatives for measuring intervention impact. It focuses on the Difference-in-Differences (DiD) technique, showing how to implement it as a regression model with clustered standard errors and fixed effects. The article also explores a Bayesian variant for incorporating prior knowledge, and situates ANOVA and ANCOVA as special cases within this broader regression framework.

Repositories & Tools

1. Qwen3-ASR-Toolkit is an advanced, high-performance Python command-line toolkit for using the Qwen-ASR API.

Top Papers of The Week

1. Scaling Laws for Differentially Private Language Models

This work derives compute–privacy–utility scaling laws for training LLMs under differential privacy. It shows that DP-optimal setups favor smaller models with very large batch sizes, and that simply adding compute provides little benefit without a larger privacy budget or more data. The findings provide guidelines for allocating resources efficiently under strict privacy constraints.

2. WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

WebWeaver introduces a dual-agent framework for open-ended deep research. A planner agent dynamically refines outlines linked to an evidence memory bank, while a writer agent retrieves and compiles evidence section by section. This structured approach integrates evidence acquisition with outline optimization, producing more coherent research outputs.

3. The Illusion of Diminishing Returns: Measuring Long Horizon Execution in LLMs

The paper shows that small gains in single-step accuracy compound into large — and even faster-than-exponential — improvements in the task length models can execute, and identifies “self-conditioning” (models amplifying their own past mistakes) as a key failure mode in long-horizon execution. Thinking models and test-time sequential compute mitigate self-conditioning and dramatically extend single-turn execution length, with frontier reasoning models outperforming non-thinking counterparts by large margins.

4. Evolving Language Models without Labels: Majority Drives Selection, Novelty Promotes Variation

Researchers introduced EVolution-Oriented and Label-free Reinforcement Learning (EVOL-RL) to enhance large language models’ self-improvement. EVOL-RL combines stability via majority-vote and variation through novelty-aware rewards, preventing entropy collapse and boosting performance. It significantly increases pass rates, notably improving pass@1 from 4.6% to 16.4% on label-free datasets, demonstrating superior generalization across domains.

5. Scaling Agents via Continual Pre-training

The authors propose Agentic Continual Pre-training (Agentic CPT) to create agentic foundation models. Their model, AgentFounder-30B, performs exceptionally on ten benchmarks, including achieving 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE, while maintaining strong tool-use capabilities in complex problem-solving.

Quick Links

1. Google AI introduces Agent Payments Protocol (AP2), an open, vendor-neutral specification for executing payments initiated by AI agents with cryptographic, auditable proof of user intent. AP2 extends existing open protocols, Agent2Agent (A2A) and Model Context Protocol (MCP), to define how agents, merchants, and payment processors exchange verifiable evidence across the “intent → cart → payment” pipeline.

2. OpenAI and NVIDIA announce a strategic partnership to deploy 10 gigawatts of NVIDIA systems, which translates to millions of GPUs that can help power OpenAI’s new models. As part of the deal, NVIDIA “intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed.”

3. Mistral AI updates its Magistral Small/Medium 1.2 models with multimodality, adding vision alongside stronger math and coding capabilities. Benchmarks show a 15% improvement on AIME and LiveCodeBench, with better tool use and more natural responses. The models now compete with larger systems on the Artificial Analysis Index and are available on Hugging Face and via API.

4. Researchers release Trading-R1, a 4B parameter model trained on 100K financial cases to generate investment theses and trading strategies. Backtests on major tickers show stronger risk-adjusted returns. The system combines distillation, reinforcement learning, and structured evidence-based reasoning, serving as a decision-support tool for financial research.

5. New report from Epoch AI zooms in on scaling and what it unlocks for scientific R&D. It forecasts that by 2030, training clusters could cost hundreds of billions of dollars, but compute scaling is unlikely to be “hitting a wall.” The report highlights the growing role of synthetic and multimodal data to mitigate bottlenecks, and projects that while power demands will rise significantly, they should remain manageable in principle.

Who’s Hiring in AI

Research Engineer — Multimodal Companion Agent @Google DeepMind (Tokyo, Japan)

Automation & AI Lead @Cognizant (Remote)

AI Foundations — Research Scientist — Research Internship: 2026 @IBM (Cambridge, MA, USA)

Legal Content AI Engineer @RELX INC (Remote/UK)

Intern, Machine Learning Engineer @Autodesk (Multiple US Locations)

Interested in sharing a job opportunity here? Contact sponsors@towardsai.net.

Think a friend would enjoy this too? Share the newsletter and let them join the conversation.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

I Built a Clinical AI Agent — and It Skipped the Tools I Gave It

ATOKEN: A Unified Tokenizer for Vision Finally Solves AI’s Biggest Problem

How to Model APIs with Ontologies and Graphs for AI Agents

From A/B Testing to DoubleML: A Data Scientist’s Guide to Causal Inference:

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

Author(s): Towards AI Editorial Team

What happened this week in AI by Louie

Why should you care?

Hottest News

Five 5-minute reads/videos to keep you learning

Repositories & Tools

Top Papers of The Week

Quick Links

Who’s Hiring in AI

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement