Fast vs. Slow: How (and When) to Make Models Think
Last Updated on October 28, 2025 by Editorial Team
Author(s): Towards AI Editorial Team
Originally published on Towards AI.
Our AI team attended COLM 2025 this year. In this piece, François Huppé-Marcoux, one of our AI engineers, shares the “aha” moment that reshaped his view of reasoning in LLMs.
We talk about “reasoning” as if it’s a universal boost: add more steps, get a better answer. But reasoning isn’t one thing. Sometimes it’s the quiet, automatic pattern-matching that lets you spot your friend in a crowd. Sometimes it’s deliberate, step-by-step analysis — checking assumptions, chaining facts, auditing edge cases. Both are reasoning. The mistake is treating only the slow, verbal kind as “real.”
Everyday life makes the split obvious. You don’t narrate how to catch a tossed set of keys; your body just does it. You do narrate your taxes. In one case, forcing a running commentary would make you worse; in the other, not writing the steps down would be reckless. The skill isn’t “always think more.” It’s knowing when thinking out loud helps and when it injects noise.
Large language models tempt us to ask for “step-by-step” on everything. That sounds prudent, but it can turn into cargo-cult analysis: extra words standing in for better judgment. If a task is mostly matching or recognition, demanding a verbal chain of thought can distract the system (and us) from the signal we actually need. For multi-constraint problems, skipping structure is how you get confident nonsense.
I learned this the hard way and mid-lecture, with a room full of people humming their answers while I was still building a mental proof.
I was at Tom Griffiths’ COLM 2025 talk on neuroscience and LLMs. He argued that reasoning is not always beneficial for LLMs. I was surprised by this. It’s generally assumed that reasoning enhances a model’s capabilities by allowing it to generate intermediate thoughts before the actual response.
But during the talk, he ran an experiment that changed my mind and the way I prompt LLMs. He showed strings from a made-up language with hidden rules about which letters could appear before or after others. Then, he flashed two candidate words and asked which belonged.
I began analyzing the problem by considering the tools available to solve it. I remembered a compiler class where we built a tiny programming language from scratch, and the concept of a deterministic finite automaton (DFA) — a way to define which sequences of characters are valid in a language. For example, in Python, you can’t start a variable name with a number. We use automata to analyze syntax and throw compilation errors such as “missing a semicolon” (in languages that use them) or “wrong indentation.”
So I started checking the first word, seeing whether the sequences of letters it contained were present in the sample text. Midway through, I heard the room go “Hmm!” I entirely missed that the presenter had asked everyone to hum if they thought the first word was correct. I was surprised that people answered so quickly. Was I just slow? Or did I misunderstand the exercise?
A bit later, I realized that no one had followed the same mental process as I had. People answered without being sure, just by looking and guessing. The room hummed for option one almost instantly, and they were right. I got there later, with certainty and an explanation, but I was slow.

The lesson wasn’t that rigor is bad; it’s that deliberation has a cost, and sometimes fast pattern recognition is the winning strategy.
Griffiths pushed this further with a vision-language example. He found that VLMs perform better on certain tasks without explicit “thinking” or reasoning. In one experiment, he showed multiple faces to a vision-language model and asked it to identify which one matched a real input image.

He found that both the model and humans perform better when they’re not engaging in verbal reasoning.
In psychology and neuroscience, there’s the well-known concept of System 1 and System 2 from Thinking, Fast and Slow by Daniel Kahneman. Fast thinking (System 1) is effortless and always active — recognizing a face, answering 2+2=4, or making practiced movements in sports. There’s little conscious deliberation, even though the brain is actively responding. Slow thinking (System 2) is when we need to mentally process information, such as solving “37*13” or my letter-by-letter analysis to find which sequences violated the language.
Over the years, this concept has been applied to AI. AI can perform better at image matching when “thinking” is turned off. As with humans, asking the model to describe facial characteristics in words (e.g., green eyes, brown hair, small nose) can add noise as opposed to simply comparing images.
Let’s do one last experiment so that the mechanics feel obvious: on the left, “find a green dot”; on the right, “find the green L.” The first is effectively an OR search, i.e., any green thing will do. The second is an AND search, meaning it must be both L and green. As the number of items increases, accuracy remains high for OR but degrades sharply for AND. In plain language: the more conditions you stack, the more a model benefits from (and sometimes requires) deliberate “thinking” and the more fragile it becomes if you don’t provide it.

With the left task, you can experience how quickly it’s done; there’s no need to think about each circle. The right task isn’t as easy: we have to check each letter one by one to spot the green “L”. In this case, Griffiths distinguishes two types of searches:
Disjunctive search (left image): Searching with an “OR” condition. Here, the single condition is “find green circles.” This is a broader search where any one condition can hold.
Conjunctive search (right image): Searching with an “AND” condition. Here, we search for “L” and green. This is a narrower search where all conditions must be met.
He finds that for disjunctive search, VLM accuracy does not degrade as we increase the number of objects, while conjunctive search accuracy degrades significantly with more objects. He writes:
This pattern suggests interference arising from the simultaneous processing of multiple items.



It’s not important to remember “conjunct” vs. “disjunctive”; the critical part is: the more you add instructions or steps to a prompt, the more you need thinking tokens. This is pretty intuitive, no surprise here.
So what does this mean for everyday prompting?
First, thinking isn’t free. Asking for chain-of-thought (or using a “reasoning mode”) consumes tokens, time, and often the model’s intuitive edge. Second, not all tasks need it. Simple classification, direct retrieval, and pure matching often perform best with minimal scaffolding: “answer in one line,” “choose A or B,” “return the ID,” “pick the closest example.” Save step-by-step for problems that truly have multiple constraints or multi-step dependencies: data transformations, multi-hop reasoning, policy checks, tool choreography, and long instructions with delicate edge cases.
A practical rule I’ve adopted since that talk:
- If the task is mostly about recognition, retrieval, or matching, prefer fast mode: concise prompts, no forced explanations, crisp output constraints.
- If the task is to compose, transform, or verify across several conditions, switch on deliberate reasoning, either by asking the model to plan (briefly) before answering or by structuring the workflow into explicit steps.
The trade-off is the same as in that lecture hall. Skipping deliberation can be faster and often right, but you lose certainty and justification. Adding deliberation buys you traceable logic and better reliability under complexity, at the cost of latency (and sometimes creativity). Get intentional about when you pay that cost.
A cleaner mental checklist (without overdoing it)
- Is success a pattern match? Use fast mode.
- Are there multiple “ANDs” to satisfy? Use deliberate mode.
- Do I need a rationale for auditability? Ask for a short, structured justification — after the answer, not before.
- Am I adding instructions just to feel safer? Remove them and re-test.
I still like my DFA instincts. They’re great when correctness matters. But that day, the humming crowd had the right intuition: sometimes, the best prompt is the shortest one, and the best “reasoning” is not reasoning at all.
Key Takeaway
Next time you turn on thinking mode in your favorite LLM application and forget it’s still on, remember: the model may perform better in non-thinking mode depending on the task. We humans are already pretty good at fast thinking, and we naturally do this phase without effort. Normally, we should use the LLM for tasks that require reasoning. Still, it can also be valuable to ask an LLM for intuitive ideas you might not have considered.
P.S. If you got the general idea of this post, that’s enough — you don’t need to overthink it!
https://arxiv.org/pdf/2503.13401
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.