Sometimes Basic Beats Agentic
Last Updated on September 4, 2025 by Editorial Team
Author(s): ravindu somawansa
Originally published on Towards AI.
Why “boring preprocessing” made our onboarding bot laser‑precise

Why this matters now
Everyone’s chasing AI agents, multimodal everything, and “let the system figure it out.”
But sometimes? That complexity caves in on itself.
Especially when you’re just trying to guide users through SAP tool onboarding with screenshots and step-by-step walkthroughs 🫥.
The context first
We were asked to create an internal chatbot to help new users navigate some of the SAP tools in our stack. These users go through initial training, and later, when they have questions, they can turn to this chatbot.
Of course, the “clients” first tried Google’s NotebookLM to see if they could do it themselves. But precision was low, so the task came to us.
Our data? The onboarding bundle — 100+ files — including:
🧑💻 SAP tools PowerPoint decks with walkthroughs
📸 Multiple screenshots per user action
🔍 Global SAP context (what it is, what it does)
🧭 High-level processes (workflows, modules)
Why NotebookLM failed

Using a fully automated tool like NotebookLM is often a good choice, and 80% of the time, it’s enough.
NotebookLM is incredible for many use cases — but not all. And we had one of those edge cases:
- Multiple screenshots were linked together to form a transaction. Answering a question meant retrieving all related transactions and screenshots.
- Transactions were basic actions, but there were higher-level flows — called processes — that combined multiple transactions.
- The screenshots were static, often outdated, with arrows and legends. Classic LLM-based OCR struggled to interpret them.
What we tried first
We took the LLM-smart route.
Describe each PowerPoint using an LLM, give the bot context, and feed the generated text into a RAG (retrieval-augmented generation) system. With a clever prompt, we hoped the LLM would work its magic.
It kinda worked.
But also — kinda didn’t.
Answers were vague or incomplete. Sometimes wrong. The bot missed key context, failed to connect steps, and couldn’t figure out which screenshot matched which action.
Why?
Because our content wasn’t a clean narrative or flat Q&A. Each action spanned multiple screenshots, mixed with global descriptions, overviews, workflows, and tiny image-driven tasks. No clear structure. No hierarchy.
The AI guessed. Poorly. 🫠
So we went boring

Instead of going deeper into “smart” territory — agentic flows, planning steps, tool-chaining — we backed up.
Simpler is often better.
We went full preprocessing:
- 👉 Extract the list of transactions and processes (Gemini 2.5 Pro)
- 👉 For each, generate a structured chunk (Gemini 2.5 Pro)
- 👉 Load them directly into the KB — clean and clear
No agents. No multimodal embeddings. No guessing. Just plain, well-done preprocessing.
Fast, old-school (relatively), and surprisingly powerful.
Step-by-step: how we actually built it
Stage 1: Finding the simple yet powerful idea
We combed through the docs and the client’s gold standard, discovering a small number of possible actions and higher-level processes.
If you don’t know what is a gold standard, check out this post ASAP
Some examples:
- “Create a sales order” (action)
- “Edit pricing fields” (action)
- “Print the inventory” (action)
- “What to do when X is late delivering” (process)
Each came with multiple screenshots, annotations, maybe even a flow chart.
We tested Gemini (via Google Workspace) by uploading some files and asking questions. We could retrieve all the info for a single transaction, but not for a process or multiple transactions together.
So we made a list of transactions, and for each, created a chunk that fully described it — small enough to fit under our embedding model’s 2,000-token limit (GCP text-embedding-005).
Then we listed all the processes and described each, referencing the relevant transactions.
TADAAAAA ✨✨✨✨.
Stage 2: Finding an efficient way to implement it
Did we build some complex code with multi-level OCR analysis using powerful LLMs? Nope. We wanted something faster.
Enter the Gemini API.
We discovered (and honestly had never used before) that Gemini has an API where you can upload an entire file (PDF, for example) and use it directly in LLM processing.
Here’s how simple it is:
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-2.5-pro-XXX")
uploaded = genai.upload_file(path=pdf_path, display_name="display_name")
response = model.generate_content(
[
uploaded,
prompt,
]
)
For the price of those few lines, you can process files up to 50 MB. Images, tables, and graphs are all handled — you can query them directly. Pretty powerful.
Then we generated our chunks and pushed them into our knowledge base.
The whole chunk-generation script was under 50 lines — and it outperformed NotebookLM.
Simple. Fast. Reliable 💪💪💪.
One Does Not Simply… ignore preprocessing 🧙♂️

I get it.
Building fancy agentic systems feels satisfying. Agents, tool calls, deep research, multimodal everything.
But “simple + reliable” beats “fancy + fuzzy” every time — especially when real users need exact guidance.
We didn’t need a magic pipeline.
We needed a map: question → correct transactions/processes → answer.
That’s exactly what preprocessing gave us.
When agents are worth it
I’m not anti-agent. I use them all the time. But you have to pick your battles.
Go agentic when:
- 🚀 Users ask open-ended, research-heavy questions
- 📚 Data is loose narrative or knowledge-dense
- 🔄 You need multi-hop processes that conditionally use data sources or APIs
Stick with structured when:
- ✅ Your content is procedural
- 📸 Users need ultra-reliable, instant answers
- 🧑🏫 Users need clear, step-by-step guidance
Conclusion
Don’t underestimate the basics.
Sometimes your smartest move… is not to be clever.
👉 Building a support bot with visual or procedural data? Start by analyzing the data and mapping transactions. Structure your chunks.
And if the fancy path calls to you — try it. But always benchmark against the boring baseline. Because → Sometimes Basic Beats Agentic 😎😎😎.
👉 If you enjoyed this article and want to read more about AI, MCP, and Multi-Agent systems, follow me here on Medium or connect with me directly on LinkedIn!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.