Fast vs. Slow: How (and When) to Make Models Think

Last Updated on October 28, 2025 by Editorial Team

Author(s): Towards AI Editorial Team

Originally published on Towards AI.

Our AI team attended COLM 2025 this year. In this piece, François Huppé-Marcoux, one of our AI engineers, shares the “aha” moment that reshaped his view of reasoning in LLMs.

We talk about “reasoning” as if it’s a universal boost: add more steps, get a better answer. But reasoning isn’t one thing. Sometimes it’s the quiet, automatic pattern-matching that lets you spot your friend in a crowd. Sometimes it’s deliberate, step-by-step analysis — checking assumptions, chaining facts, auditing edge cases. Both are reasoning. The mistake is treating only the slow, verbal kind as “real.”

Everyday life makes the split obvious. You don’t narrate how to catch a tossed set of keys; your body just does it. You do narrate your taxes. In one case, forcing a running commentary would make you worse; in the other, not writing the steps down would be reckless. The skill isn’t “always think more.” It’s knowing when thinking out loud helps and when it injects noise.

Large language models tempt us to ask for “step-by-step” on everything. That sounds prudent, but it can turn into cargo-cult analysis: extra words standing in for better judgment. If a task is mostly matching or recognition, demanding a verbal chain of thought can distract the system (and us) from the signal we actually need. For multi-constraint problems, skipping structure is how you get confident nonsense.

I learned this the hard way and mid-lecture, with a room full of people humming their answers while I was still building a mental proof.

I was at Tom Griffiths’ COLM 2025 talk on neuroscience and LLMs. He argued that reasoning is not always beneficial for LLMs. I was surprised by this. It’s generally assumed that reasoning enhances a model’s capabilities by allowing it to generate intermediate thoughts before the actual response.

But during the talk, he ran an experiment that changed my mind and the way I prompt LLMs. He showed strings from a made-up language with hidden rules about which letters could appear before or after others. Then, he flashed two candidate words and asked which belonged.

I began analyzing the problem by considering the tools available to solve it. I remembered a compiler class where we built a tiny programming language from scratch, and the concept of a deterministic finite automaton (DFA) — a way to define which sequences of characters are valid in a language. For example, in Python, you can’t start a variable name with a number. We use automata to analyze syntax and throw compilation errors such as “missing a semicolon” (in languages that use them) or “wrong indentation.”

So I started checking the first word, seeing whether the sequences of letters it contained were present in the sample text. Midway through, I heard the room go “Hmm!” I entirely missed that the presenter had asked everyone to hum if they thought the first word was correct. I was surprised that people answered so quickly. Was I just slow? Or did I misunderstand the exercise?

A bit later, I realized that no one had followed the same mental process as I had. People answered without being sure, just by looking and guessing. The room hummed for option one almost instantly, and they were right. I got there later, with certainty and an explanation, but I was slow.

Fast vs. Slow: How (and When) to Make Models Think

The lesson wasn’t that rigor is bad; it’s that deliberation has a cost, and sometimes fast pattern recognition is the winning strategy.

Griffiths pushed this further with a vision-language example. He found that VLMs perform better on certain tasks without explicit “thinking” or reasoning. In one experiment, he showed multiple faces to a vision-language model and asked it to identify which one matched a real input image.

He found that both the model and humans perform better when they’re not engaging in verbal reasoning.

In psychology and neuroscience, there’s the well-known concept of System 1 and System 2 from Thinking, Fast and Slow by Daniel Kahneman. Fast thinking (System 1) is effortless and always active — recognizing a face, answering 2+2=4, or making practiced movements in sports. There’s little conscious deliberation, even though the brain is actively responding. Slow thinking (System 2) is when we need to mentally process information, such as solving “37*13” or my letter-by-letter analysis to find which sequences violated the language.

Over the years, this concept has been applied to AI. AI can perform better at image matching when “thinking” is turned off. As with humans, asking the model to describe facial characteristics in words (e.g., green eyes, brown hair, small nose) can add noise as opposed to simply comparing images.

Let’s do one last experiment so that the mechanics feel obvious: on the left, “find a green dot”; on the right, “find the green L.” The first is effectively an OR search, i.e., any green thing will do. The second is an AND search, meaning it must be both L and green. As the number of items increases, accuracy remains high for OR but degrades sharply for AND. In plain language: the more conditions you stack, the more a model benefits from (and sometimes requires) deliberate “thinking” and the more fragile it becomes if you don’t provide it.

With the left task, you can experience how quickly it’s done; there’s no need to think about each circle. The right task isn’t as easy: we have to check each letter one by one to spot the green “L”. In this case, Griffiths distinguishes two types of searches:

Disjunctive search (left image): Searching with an “OR” condition. Here, the single condition is “find green circles.” This is a broader search where any one condition can hold.

Conjunctive search (right image): Searching with an “AND” condition. Here, we search for “L” and green. This is a narrower search where all conditions must be met.

He finds that for disjunctive search, VLM accuracy does not degrade as we increase the number of objects, while conjunctive search accuracy degrades significantly with more objects. He writes:

This pattern suggests interference arising from the simultaneous processing of multiple items.

Find the green “L” in this image (with thinking)

Find the green “L” in this image (without thinking)

It’s not important to remember “conjunct” vs. “disjunctive”; the critical part is: the more you add instructions or steps to a prompt, the more you need thinking tokens. This is pretty intuitive, no surprise here.

So what does this mean for everyday prompting?

First, thinking isn’t free. Asking for chain-of-thought (or using a “reasoning mode”) consumes tokens, time, and often the model’s intuitive edge. Second, not all tasks need it. Simple classification, direct retrieval, and pure matching often perform best with minimal scaffolding: “answer in one line,” “choose A or B,” “return the ID,” “pick the closest example.” Save step-by-step for problems that truly have multiple constraints or multi-step dependencies: data transformations, multi-hop reasoning, policy checks, tool choreography, and long instructions with delicate edge cases.

A practical rule I’ve adopted since that talk:

If the task is mostly about recognition, retrieval, or matching, prefer fast mode: concise prompts, no forced explanations, crisp output constraints.
If the task is to compose, transform, or verify across several conditions, switch on deliberate reasoning, either by asking the model to plan (briefly) before answering or by structuring the workflow into explicit steps.

The trade-off is the same as in that lecture hall. Skipping deliberation can be faster and often right, but you lose certainty and justification. Adding deliberation buys you traceable logic and better reliability under complexity, at the cost of latency (and sometimes creativity). Get intentional about when you pay that cost.

A cleaner mental checklist (without overdoing it)

Is success a pattern match? Use fast mode.
Are there multiple “ANDs” to satisfy? Use deliberate mode.
Do I need a rationale for auditability? Ask for a short, structured justification — after the answer, not before.
Am I adding instructions just to feel safer? Remove them and re-test.

I still like my DFA instincts. They’re great when correctness matters. But that day, the humming crowd had the right intuition: sometimes, the best prompt is the shortest one, and the best “reasoning” is not reasoning at all.

Key Takeaway

Next time you turn on thinking mode in your favorite LLM application and forget it’s still on, remember: the model may perform better in non-thinking mode depending on the task. We humans are already pretty good at fast thinking, and we naturally do this phase without effort. Normally, we should use the LLM for tasks that require reasoning. Still, it can also be valuable to ask an LLM for intuitive ideas you might not have considered.

P.S. If you got the general idea of this post, that’s enough — you don’t need to overthink it!

https://arxiv.org/pdf/2503.13401

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Fast vs. Slow: How (and When) to Make Models Think

Author(s): Towards AI Editorial Team

So what does this mean for everyday prompting?

A cleaner mental checklist (without overdoing it)

Key Takeaway

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Fast vs. Slow: How (and When) to Make Models Think

Author(s): Towards AI Editorial Team

So what does this mean for everyday prompting?

A cleaner mental checklist (without overdoing it)

Key Takeaway

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement