Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution
Artificial Intelligence   Latest   Machine Learning

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

Last Updated on November 25, 2025 by Editorial Team

Author(s): cai zhang

Originally published on Towards AI.

AI Can Write Poetry, But Can’t Pour Coffee: Inside the Next Great Tech Revolution

This article summarizes the views from Fei-Fei Li, an AI expert, in her first Substack post titled “From Words to Worlds: Spatial Intelligence is AI’s Next Frontier.

Introduction: The Genius in the Room Has a Blind Spot

In 1950, Alan Turing posed a question that would launch a relentless scientific quest: “Can machines think?” Today, with large language models (LLMs) that generate fluent text, write complex code, and create photorealistic images in seconds, we seem closer than ever to an answer. Modern AI feels magical, a machine that has mastered the abstract knowledge of the internet.

But this genius has a profound blind spot. It can write an eloquent essay on the physics of pouring coffee, yet it can’t perform the simple physical task itself. It can describe how to park a car but lacks the intuitive understanding to do it. This is the central paradox of today’s AI.

This vision for what’s next comes from Fei-Fei Li, one of the architects of the modern AI era. As a key figure behind ImageNet — the dataset that helped ignite the current deep learning boom — and the leader of Stanford’s AI Lab, she has long pursued what she calls her “North Star”: endowing machines with visual and spatial understanding. Now, in a recent essay, she argues that the missing piece is “spatial intelligence,” the next great frontier that holds the key to unlocking AI’s true potential. This article explores the most impactful takeaways from her vision for AI’s future.

1. Today’s AI is a “Wordsmith in the Dark”

While LLMs have mastered the world of text, they lack a grounded understanding of physical reality. They have immense knowledge but no real-world experience, leaving them disconnected from the world they seek to understand.

Fei-Fei Li captures this limitation perfectly:

“Yet they remain wordsmiths in the dark; eloquent but inexperienced, knowledgeable but ungrounded.”

This isn’t just a philosophical problem; it appears in practical tests. Despite their sophistication, state-of-the-art models perform poorly on basic spatial tasks. They struggle with:

  • Estimating the distance, orientation, and size of objects.
  • “Mentally” rotating objects by regenerating them from new angles.
  • Navigating mazes or recognizing shortcuts.
  • Predicting the outcomes of basic physics.

The contrast is stark. These systems demonstrate superhuman ability in language, yet their spatial reasoning is sub-human. The gap highlights a fundamental difference in how they perceive reality. As Li explains, our view of the world is holistic — “not just what we’re looking at, but how everything relates spatially, what it means, and why it matters.” For AI, that holistic understanding is still missing.

2. Intelligence Began with Sensation, Not Language

We often equate intelligence with language and abstract thought. But according to Li’s analysis, its evolutionary roots are far more fundamental. Intelligence didn’t begin with words, but with sensation — “a glimmer of light or the feeling of texture.” This created a bridge between perception and survival, forming the “core loop driving the evolution of intelligence.”

This foundational capability is spatial intelligence, which Li calls the “scaffolding upon which our cognition is built.” It’s the intuitive fluency we use to park a car, catch tossed keys, or navigate a crowded sidewalk. It’s at play when children spend their pre-verbal years learning through play, or when firefighters navigate a collapsing building through “body language and a shared professional instinct for which there’s no linguistic substitute.”

This form of intelligence has driven civilization-defining breakthroughs that were impossible through text alone:

  • Calculating Earth’s Circumference: In ancient Greece, Eratosthenes used the spatial relationship between shadows and the sun’s angle to calculate the size of our planet — a feat of geometric and physical reasoning.
  • Discovering DNA’s Structure: Watson and Crick didn’t just write equations; they physically built 3D models, manipulating metal plates and wires until the double helix structure “clicked into place” spatially.

The synthesis is clear: just as Watson and Crick needed to physically manipulate models to see the structure of DNA, today’s AI needs a virtual “physicality” to grasp concepts that text alone cannot convey. To reach the next level, it must learn to think in this foundational, spatial way.

3. The Solution Isn’t a Better Language Model, It’s a “World Model”

The path forward isn’t simply a bigger language model. Li argues it requires a new, even more ambitious type of AI called “world models,” which her company World Labs was founded to build.

A world model is defined by three essential capabilities:

  1. Generative: It must be able to generate endlessly varied 3D worlds that are not only visually diverse but also physically and geometrically consistent. This is about more than just making pretty pictures; as Li notes, a world model’s “understanding of the present must be tied coherently to its past.”
  2. Multimodal: It must process and understand inputs beyond text, including images, videos, gestures, and actions. This allows humans and other agents to communicate with the model about its world in rich, diverse ways, mirroring how we interact with our own.
  3. Interactive: It must be able to predict the “next state” of the world based on a given action. This forms the basis for planning and understanding cause and effect, allowing the model to reason about what will happen if an object is moved or a force is applied.

This represents an enormous technical challenge. The dimensionality of representing a dynamic, physical world is “vastly more complex” than representing one-dimensional language. It requires a new “universal task function” beyond the “next-token prediction” that powers LLMs, and it must overcome the fact that training data for robotics is “scarce” compared to the internet’s ocean of text.

4. Expect Creative Superpowers Before Robot Butlers

The applications of spatial intelligence won’t arrive all at once. Li outlines a phased rollout: creative tools are emerging “now,” robotics represents a “mid-term horizon,” and transformative scientific applications will take longer.

This phased approach is itself an important insight. The first mainstream impact of this next AI wave won’t be replacing physical labor, but rather supercharging human imagination. Li’s company, World Labs, is already demonstrating this with its platform “Marble,” which allows creators like filmmakers and architects to “conjure entire worlds without the constraints of budget or geography,” rapidly building and exploring 3D environments.

Robotics is a harder problem because it requires closing the “gap between simulation and reality.” Robots must translate digital understanding into precise physical action. World models will be critical here, serving as engines to generate the massive amounts of synthetic data needed to train robots to navigate the complexities of the real world.

5. The Ultimate Goal: AI That Augments, Not Replaces

Underpinning this entire pursuit is a guiding philosophy about AI’s purpose. Li makes her motivation, forged over a 25-year career, intensely personal and clear:

“As one of the scientists who helped usher in the era of modern AI, my motivation has always been clear: AI must augment human capability, not replace it.”

This “human-centric” approach envisions AI as a collaborative partner. It’s not a generic ideal, but a tangible vision: a lab robot that might “handle instruments so the scientist can focus on tasks needing dexterity or reasoning”; an ambient monitoring system that helps a caregiver without replacing the human connection; or a tool that enables a teacher to create immersive educational worlds. In these scenarios, the AI extends our reach and “respect[s] the agency and dignity of people.”

In a world filled with extreme narratives of techno-utopia and apocalypse, this vision offers a pragmatic and hopeful path forward — one where technology serves to make us more capable, creative, and connected.

Conclusion: Beyond Words, a New World

The last decade saw AI master the abstract world of language, a monumental achievement. But as Li’s work makes clear, the journey toward answering Turing’s question has just entered its next, more grounded phase. The great challenge is no longer just mastering words, but embracing the spatial intelligence that underpins our own cognition.

This new frontier is not about creating a machine that thinks for us, but one that helps us perceive, create, and interact with the world in richer, more powerful ways. Almost half a billion years after nature unleashed the first glimmers of perception in ancient life, Li believes we are the generation privileged enough to endow machines with this same capability. This quest is her North Star.

As machines begin to understand the world as we do, what new worlds — real or imagined — will we choose to build with them?

If You Wish To Support Me

  • Follow me and clap 50 times for this story
  • Leave a comment telling me your thoughts
  • Highlight your favourite part of the story

Thanks for your support — every like means a lot and keeps me motivated! 💪💖

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.