AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

Last Updated on November 25, 2025 by Editorial Team

Author(s): cai zhang

Originally published on Towards AI.

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

This article summarizes the views from Fei-Fei Li, an AI expert, in her first Substack post titled “From Words to Worlds: Spatial Intelligence is AI’s Next Frontier.”

Introduction: The Genius in the Room Has a Blind Spot

In 1950, Alan Turing posed a question that would launch a relentless scientific quest: “Can machines think?” Today, with large language models (LLMs) that generate fluent text, write complex code, and create photorealistic images in seconds, we seem closer than ever to an answer. Modern AI feels magical, a machine that has mastered the abstract knowledge of the internet.

But this genius has a profound blind spot. It can write an eloquent essay on the physics of pouring coffee, yet it can’t perform the simple physical task itself. It can describe how to park a car but lacks the intuitive understanding to do it. This is the central paradox of today’s AI.

This vision for what’s next comes from Fei-Fei Li, one of the architects of the modern AI era. As a key figure behind ImageNet — the dataset that helped ignite the current deep learning boom — and the leader of Stanford’s AI Lab, she has long pursued what she calls her “North Star”: endowing machines with visual and spatial understanding. Now, in a recent essay, she argues that the missing piece is “spatial intelligence,” the next great frontier that holds the key to unlocking AI’s true potential. This article explores the most impactful takeaways from her vision for AI’s future.

1. Today’s AI is a “Wordsmith in the Dark”

While LLMs have mastered the world of text, they lack a grounded understanding of physical reality. They have immense knowledge but no real-world experience, leaving them disconnected from the world they seek to understand.

Fei-Fei Li captures this limitation perfectly:

“Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.”

This isn’t just a philosophical problem; it appears in practical tests. Despite their sophistication, state-of-the-art models perform poorly on basic spatial tasks. They struggle with:

Estimating the distance, orientation, and size of objects.
“Mentally” rotating objects by regenerating them from new angles.
Navigating mazes or recognizing shortcuts.
Predicting the outcomes of basic physics.

The contrast is stark. These systems demonstrate superhuman ability in language, yet their spatial reasoning is sub-human. The gap highlights a fundamental difference in how they perceive reality. As Li explains, our view of the world is holistic — “not just what we’re looking at, but how everything relates spatially, what it means, and why it matters.” For AI, that holistic understanding is still missing.

2. Intelligence Began with Sensation, Not Language

We often equate intelligence with language and abstract thought. But according to Li’s analysis, its evolutionary roots are far more fundamental. Intelligence didn’t begin with words, but with sensation — “a glimmer of light or the feeling of texture.” This created a bridge between perception and survival, forming the “core loop driving the evolution of intelligence.”

This foundational capability is spatial intelligence, which Li calls the “scaffolding upon which our cognition is built.” It’s the intuitive fluency we use to park a car, catch tossed keys, or navigate a crowded sidewalk. It’s at play when children spend their pre-verbal years learning through play, or when firefighters navigate a collapsing building through “body language and a shared professional instinct for which there’s no linguistic substitute.”

This form of intelligence has driven civilization-defining breakthroughs that were impossible through text alone:

Calculating Earth’s Circumference: In ancient Greece, Eratosthenes used the spatial relationship between shadows and the sun’s angle to calculate the size of our planet — a feat of geometric and physical reasoning.
Discovering DNA’s Structure: Watson and Crick didn’t just write equations; they physically built 3D models, manipulating metal plates and wires until the double helix structure “clicked into place” spatially.

The synthesis is clear: just as Watson and Crick needed to physically manipulate models to see the structure of DNA, today’s AI needs a virtual “physicality” to grasp concepts that text alone cannot convey. To reach the next level, it must learn to think in this foundational, spatial way.

3. The Solution Isn’t a Better Language Model, It’s a “World Model”

The path forward isn’t simply a bigger language model. Li argues it requires a new, even more ambitious type of AI called “world models,” which her company World Labs was founded to build.

A world model is defined by three essential capabilities:

Generative: It must be able to generate endlessly varied 3D worlds that are not only visually diverse but also physically and geometrically consistent. This is about more than just making pretty pictures; as Li notes, a world model’s “understanding of the present must be tied coherently to its past.”
Multimodal: It must process and understand inputs beyond text, including images, videos, gestures, and actions. This allows humans and other agents to communicate with the model about its world in rich, diverse ways, mirroring how we interact with our own.
Interactive: It must be able to predict the “next state” of the world based on a given action. This forms the basis for planning and understanding cause and effect, allowing the model to reason about what will happen if an object is moved or a force is applied.

This represents an enormous technical challenge. The dimensionality of representing a dynamic, physical world is “vastly more complex” than representing one-dimensional language. It requires a new “universal task function” beyond the “next-token prediction” that powers LLMs, and it must overcome the fact that training data for robotics is “scarce” compared to the internet’s ocean of text.

4. Expect Creative Superpowers Before Robot Butlers

The applications of spatial intelligence won’t arrive all at once. Li outlines a phased rollout: creative tools are emerging “now,” robotics represents a “mid-term horizon,” and transformative scientific applications will take longer.

This phased approach is itself an important insight. The first mainstream impact of this next AI wave won’t be replacing physical labor, but rather supercharging human imagination. Li’s company, World Labs, is already demonstrating this with its platform “Marble,” which allows creators like filmmakers and architects to “conjure entire worlds without the constraints of budget or geography,” rapidly building and exploring 3D environments.

Robotics is a harder problem because it requires closing the “gap between simulation and reality.” Robots must translate digital understanding into precise physical action. World models will be critical here, serving as engines to generate the massive amounts of synthetic data needed to train robots to navigate the complexities of the real world.

5. The Ultimate Goal: AI That Augments, Not Replaces

Underpinning this entire pursuit is a guiding philosophy about AI’s purpose. Li makes her motivation, forged over a 25-year career, intensely personal and clear:

“As one of the scientists who helped usher in the era of modern AI, my motivation has always been clear: AI must augment human capability, not replace it.”

This “human-centric” approach envisions AI as a collaborative partner. It’s not a generic ideal, but a tangible vision: a lab robot that might “handle instruments so the scientist can focus on tasks needing dexterity or reasoning”; an ambient monitoring system that helps a caregiver without replacing the human connection; or a tool that enables a teacher to create immersive educational worlds. In these scenarios, the AI extends our reach and “respect[s] the agency and dignity of people.”

In a world filled with extreme narratives of techno-utopia and apocalypse, this vision offers a pragmatic and hopeful path forward — one where technology serves to make us more capable, creative, and connected.

Conclusion: Beyond Words, a New World

The last decade saw AI master the abstract world of language, a monumental achievement. But as Li’s work makes clear, the journey toward answering Turing’s question has just entered its next, more grounded phase. The great challenge is no longer just mastering words, but embracing the spatial intelligence that underpins our own cognition.

This new frontier is not about creating a machine that thinks for us, but one that helps us perceive, create, and interact with the world in richer, more powerful ways. Almost half a billion years after nature unleashed the first glimmers of perception in ancient life, Li believes we are the generation privileged enough to endow machines with this same capability. This quest is her North Star.

As machines begin to understand the world as we do, what new worlds — real or imagined — will we choose to build with them?

If You Wish To Support Me

Follow me and clap 50 times for this story
Leave a comment telling me your thoughts
Highlight your favourite part of the story

Thanks for your support — every like means a lot and keeps me motivated! 💪💖

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

Author(s): cai zhang

Introduction: The Genius in the Room Has a Blind Spot

1. Today’s AI is a “Wordsmith in the Dark”

2. Intelligence Began with Sensation, Not Language

3. The Solution Isn’t a Better Language Model, It’s a “World Model”

4. Expect Creative Superpowers Before Robot Butlers

5. The Ultimate Goal: AI That Augments, Not Replaces

Conclusion: Beyond Words, a New World

If You Wish To Support Me

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

Author(s): cai zhang

Introduction: The Genius in the Room Has a Blind Spot

1. Today’s AI is a “Wordsmith in the Dark”

2. Intelligence Began with Sensation, Not Language

3. The Solution Isn’t a Better Language Model, It’s a “World Model”

4. Expect Creative Superpowers Before Robot Butlers

5. The Ultimate Goal: AI That Augments, Not Replaces

Conclusion: Beyond Words, a New World

If You Wish To Support Me

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement