From Zero-Shot to BoT: A Practical Overview of LLM Reasoning Frameworks

Last Updated on September 4, 2025 by Editorial Team

Author(s): Tiyasa Mukherjee

Originally published on Towards AI.

This article walks through the evolution of reasoning methods for large language models — from simple prompting (Zero-Shot, CoT) to advanced frameworks (ToT, GoT, BoT). It focuses on concepts, comparisons, and practical applicability, making it a guide for practitioners to understand when and how to apply each method effectively.

From Zero-Shot to BoT: A Practical Overview of LLM Reasoning Frameworks — Source —Image

Introduction

LLMs can do more than just generate fluent text — with the right guidance, they can reason through problems step by step. But the big question for developers remains: how much reasoning structure is enough to get reliable answers without overloading cost and latency?

What started with simple zero-shot prompting (“just answer directly”) has quickly evolved into richer reasoning frameworks like Chain of Thought (CoT), Tree of Thoughts (ToT), and Graph of Thoughts (GoT). Each step in this evolution reflects the same underlying struggle: balancing accuracy, efficiency, and scalability when solving complex tasks with LLMs.

This article takes a practical, builder-first look at these methods — showing you where they shine, where they fail, and how to experiment with them.

Historical Background

The progression of reasoning frameworks in LLMs reflects a steady push from minimal guidance to structured, reusable reasoning:

Zero-Shot & Few-Shot Prompting (pre-2022): The earliest prompting strategies simply provided a problem (zero-shot) or a few input–output examples (few-shot). These methods worked well for surface-level tasks but often failed on multi-step reasoning because the model wasn’t explicitly guided on how to “think.”
Chain of Thought (CoT), 2022: Introduced in “Chain-of-thought prompting elicits reasoning in large language models” (Google). By prompting models to “think step by step”, CoT significantly boosted performance on math, logic, and commonsense reasoning tasks. Limitation: one linear chain meant a single wrong step could ruin the entire solution.
Tree of Thoughts (ToT), 2023: Proposed in “Tree of thoughts: deliberate problem solving with large language models”. To address CoT’s brittleness, ToT allowed branching reasoning: the model could explore multiple candidate paths, evaluated via search strategies (BFS, DFS, beam search). This improved robustness but increased cost and latency.
Graph of Thoughts (GoT), 2023: Introduced in “Graph of thoughts: solving elaborate problems with large language models”. GoT generalized ToT by organizing reasoning into graphs. Sub-thoughts could be shared across paths, avoiding duplicate work and supporting modular reasoning. The trade-off: orchestration complexity and risk of runaway expansion.
Buffer of Thoughts (BoT), 2024: Discussed in “Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models”. BoT introduced a memory buffer of reusable reasoning fragments. Instead of generating full trees or graphs every time, the system could recall past reasoning patterns, balancing efficiency with accuracy.

Reasoning Frameworks in Practice

Zero-Shot Prompting

In zero-shot, you simply ask the model the question directly — no examples, no reasoning instructions. It relies entirely on the model’s pre-training.

ZERO_SHOT_PROMPT = """
Answer the following question directly and concisely.

Question: Which planet is known as the Red Planet?
Answer:
"""

One-Shot Prompting

One-shot prompting provides the model with a single example before asking it to solve a new task. The example serves as a template that guides the model’s tone, format, or reasoning style. This is especially useful when the task requires a specific structure that the model might not produce reliably in zero-shot mode, such as email replies, summaries, or structured outputs.

ONE_SHOT_PROMPT = """
You are a helpful customer support assistant. Follow the style shown in the example.

Example:
Q: A customer writes: "My order is late. Can you check?"
A: Dear Customer, 
I’m sorry to hear about the delay. I’ve checked and your order is on the way. You can track it here: <tracking-link>. Thank you for your patience. 

Now handle this:
Q: {question}
A:
"""

Few-Shot Prompting

Few-shot prompting extends the idea of one-shot by providing the model with multiple examples of how to solve similar problems before asking it to generate a new answer. These examples create a mini “training set” inside the prompt, allowing the model to better capture the style, structure, and reasoning pattern needed.

This is one of the most widely used prompting techniques because it strikes a balance between flexibility and reliability: the model learns the expected format and reasoning style without needing fine-tuning.

FEW_SHOT_PROMPT = """
You are a helpful assistant that summarizes customer reviews in exactly one short sentence. 
Capture both the positive and negative aspects when present.

Example 1:
Review: "The food was amazing, but the service was slow." 
Summary: Great food, but slow service. 

Example 2:
Review: "The phone has excellent battery life, but the camera is disappointing." 
Summary: Strong battery, weak camera. 

Example 3:
Review: "The hotel was clean and well-located, but the staff were unfriendly." 
Summary: Clean, convenient hotel with unfriendly staff. 

Now summarize this new review: 
Review: {review}
"""

Chain of Thought (CoT)

The key idea of CoT is simple: instead of asking the model for just an answer, we guide it to reason step by step. This dramatically reduces errors in tasks like math word problems, logical reasoning, and common-sense inference.

It compares **Standard Prompting** and **Chain-of-Thought Prompting**, showing that while the standard approach gives the wrong apple count (27), CoT breaks the problem into steps (23 − 20 + 6) and arrives at the correct answer (9) [Source – https://arxiv.org/abs/2201.11903]

There are different variations of CoT as the following:

Zero-Shot CoT: Add “Let’s think step by step” to the prompt to trigger reasoning.
Few-Shot CoT: Provide worked-out examples before the actual question to guide reasoning.

It shows how **Zero-shot-CoT** improves accuracy over Few-shot and Zero-shot by explicitly reasoning step by step, correctly solving the “blue golf balls” problem (answer = 4), while other methods give the wrong result (answer = 8) [Source – https://arxiv.org/abs/2205.11916]

LM-Guided CoT: Uses knowledge distillation, where a large LM generates both the answer and the reasoning (rationale), and are distilled into a smaller LM. The small LM learns to mimic the reasoning steps and generates rationals itself. These rationals are the evaluated by the Large LM, where reinforcement signals help fine-tune the small LM’s reasoning further. This makes CoT more resource-efficient without heavy reliance on large models.

Illustrates **Rationale Distillation and Refinement**, where a large LM generates reasoning and answers, a smaller distilled LM learns to refine the rationale, and the improved reasoning leads the large LM to provide the correct prediction [Source — https://arxiv.org/pdf/2404.03414.pdf]

Multimodal CoT: Extends standard CoT by combining text and visual inputs into a two-stage reasoning process: Rationale Generation — the model processes both the prompt and the image to generate a step‑by‑step explanation (rationale). Answer Inference — building on that rationale, the model then produces the final answer. This helps the model ground its reasoning in real visual content, rather than textual paraphrase alone.

The model is given two objects — a cracker and fries — along with the question: *“Which property do these two objects have in common?”* with options *(A) soft* or *(B) salty*. The reasoning process evaluates each object’s properties: fries are salty and soft, while the cracker is salty but not soft. By comparing, the model concludes that the common property is *salty*. [Source https://arxiv.org/abs/2302.00923]

Automatic CoT: Automatic CoT or Auto‑CoT eliminates the need for hand-crafted CoT examples by automatically generating diverse, high-quality demonstrations from the model itself using the two-step reasoning process. Question clustering — partition questions of a given dataset into a few clusters, Demonstration sampling— select a representative question from each cluster and generate its reasoning chain using Zero-Shot-CoT with simple heuristics.

This diagram shows **Automatic CoT (Auto-CoT)**, where the model clusters sample questions and auto-generates step-by-step reasoning demos (e.g., counting songs, cooking potatoes, or caging puppies) to guide in-context problem solving without manual examples. [Source https://arxiv.org/abs/2210.03493]

Tree of Thoughts (ToT)

ToT elevates Chain-of-Thought by letting the model explore multiple reasoning paths — not just a single sequence. It organizes “thoughts” (partial steps or subgoals) into a tree structure and uses search strategies like BFS or DFS to navigate through potential solutions.

It shows the progression of reasoning methods: **IO prompting** maps input directly to output, **CoT** adds step-by-step reasoning, **CoT-SC** samples multiple paths and picks the most consistent answer, while **ToT** explores reasoning as a branching tree to evaluate and refine partial solutions. [Source https://arxiv.org/abs/2305.10601]

In the Game of 24 example, the model is given the numbers 4, 9, 10, 13 and asked to combine them with arithmetic operations to reach 24. Instead of following a single step-by-step chain, the Tree-of-Thoughts (ToT) approach lets the model explore multiple possible moves in parallel. For instance, it might try 4 + 9 = 13 or 10 - 4 = 6 as candidate next steps. Each partial result is then evaluated: if a path is clearly impossible (e.g., leading to numbers far from 24), it is discarded; if it looks promising, it is kept for further exploration. The process continues like a branching search tree—expanding “maybe” candidates and pruning dead ends—until a correct reasoning path is found. This allows the model to backtrack and test alternatives, rather than committing early to a single chain that might lead to the wrong answer.

Shows how the **Tree of Thoughts (ToT)** framework works. Given an input, the model first generates possible next steps (**thought generation**) using a propose prompt. Then, through a value prompt, the model evaluates whether these steps are useful (**thought evaluation**). This iterative process of proposing and validating thoughts helps the model explore reasoning paths more effectively. [Source https://arxiv.org/abs/2305.10601]

Graph of Thoughts (GoT)

GoT advances reasoning by treating “thoughts” as nodes in a graph with edges as logical links. Unlike CoT’s linear chain or ToT’s branching tree, GoT enables non-linear connections — ideas can merge, loop back, and be reused.

The framework processes inputs as graphs: nodes encode concepts, edges capture dependencies, and embeddings represent both content and context. Through cross-attention and gated fusion, GoT highlights key relationships while filtering noise, before a transformer decoder generates the final output.

The **Graph of Thoughts (GoT)** extends Tree of Thoughts by enabling flexible graph-based reasoning — allowing backtracking, refining, discarding, and aggregating thoughts — making LLM reasoning more adaptive and human-like. [Source — https://arxiv.org/pdf/2308.09687]

Rationale: By mimicking how humans think — through interconnected ideas rather than straight lines — GoT allows LLMs to perform richer, multi-step reasoning with greater efficiency and transparency.

Buffer of Thoughts (BoT)

BoT introduces the idea of a memory buffer for reasoning. Instead of re-expanding every step like in CoT, ToT, or GoT, the model maintains a temporary storage of distilled “thoughts” (intermediate reasoning patterns). These can be reused across tasks, much like a working memory.

Efficiency: Saves cost and tokens by avoiding repeated exploration.
Accuracy: Builds on past reasoning “templates,” improving consistency.

Practical Applicability & Usage of Reasoning Methods

Zero-Shot Prompting: Best for straightforward tasks where the model can answer directly without prior examples (e.g., fact-based Q&A, classification, or simple instructions). Useful when you want quick results without training data.
Few-Shot Prompting: Effective when the model needs guidance on format or reasoning style. By showing a few examples, you can steer it to produce structured outputs (e.g., sentiment analysis, text classification, or summarization in a specific style).
Chain of Thought (CoT): Ideal for tasks that require step-by-step reasoning such as arithmetic problems, logical puzzles, or decision-making processes. CoT helps the model “show its work.”
Tree of Thoughts (ToT): Suitable for creative or open-ended problem solving, where exploring multiple reasoning paths matters (e.g., brainstorming, strategic planning, game solving).
Graph of Thoughts (GoT): Valuable in complex, interconnected reasoning tasks such as knowledge integration, multi-step scientific problem solving, or planning with feedback loops. It allows reusing and refining ideas across different reasoning paths.
Buffer of Thoughts (BoT): Useful for maintaining reasoning state over longer contexts, preventing the model from “forgetting” prior steps. Ideal for tasks like long-form reasoning, multi-turn dialogue, or extended problem solving.

Comparison of Methods (Quick View)

Conclusion

Reasoning-oriented prompting methods have rapidly evolved from simple step-by-step chains (CoT) to advanced structures like trees (ToT), graphs (GoT), and buffers (BoT). Each method offers unique strengths — whether it’s the simplicity and interpretability of CoT, the automation efficiency of Auto-CoT, the exploration depth of ToT, the non-linear flexibility of GoT, or the memory efficiency of BoT.

The key insight is that no single method dominates across all tasks. Instead, the choice depends on the problem’s complexity and structure:

Linear tasks benefit from CoT/Auto-CoT.
Multi-path decision making suits ToT.
Knowledge-dense and interconnected problems align with GoT.
Repetitive structured reasoning thrives with BoT.

Ultimately, these methods illustrate a broader trend: prompting is no longer about “what the model knows”, but about “how the model thinks”. By selecting and combining the right reasoning strategies, practitioners can unlock higher accuracy, interpretability, and real-world usability of LLMs across domains such as education, business intelligence, governance, and scientific research.

I’d love to hear your perspectives — which reasoning method have you found most practical in your work?

💬 Share your thoughts in the comments
🤝 Connect with me on LinkedIn to continue the conversation
📌 Follow me here on Medium for more deep dives into LLMs, reasoning, and agentic AI

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

From Zero-Shot to BoT: A Practical Overview of LLM Reasoning Frameworks

Author(s): Tiyasa Mukherjee

Introduction

Historical Background

Reasoning Frameworks in Practice

Zero-Shot Prompting

One-Shot Prompting

Few-Shot Prompting

Chain of Thought (CoT)

Tree of Thoughts (ToT)

Graph of Thoughts (GoT)

Buffer of Thoughts (BoT)

Practical Applicability & Usage of Reasoning Methods

Comparison of Methods (Quick View)

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

From Zero-Shot to BoT: A Practical Overview of LLM Reasoning Frameworks

Author(s): Tiyasa Mukherjee

Introduction

Historical Background

Reasoning Frameworks in Practice

Zero-Shot Prompting

One-Shot Prompting

Few-Shot Prompting

Chain of Thought (CoT)

Tree of Thoughts (ToT)

Graph of Thoughts (GoT)

Buffer of Thoughts (BoT)

Practical Applicability & Usage of Reasoning Methods

Comparison of Methods (Quick View)

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement