
Sometimes Basic Beats Agentic
Last Updated on September 4, 2025 by Editorial Team
Author(s): ravindu somawansa
Originally published on Towards AI.
Why “boring preprocessing” made our onboarding bot laser‑precise

Why this matters now
Everyone’s chasing AI agents, multimodal everything, and “let the system figure it out.”
But sometimes? That complexity caves in on itself.
Especially when you’re just trying to guide users through SAP tool onboarding with screenshots and step-by-step walkthroughs 🫥.
The context first
We were asked to create an internal chatbot to help new users navigate some of the SAP tools in our stack. These users go through initial training, and later, when they have questions, they can turn to this chatbot.
Of course, the “clients” first tried Google’s NotebookLM to see if they could do it themselves. But precision was low, so the task came to us.
Our data? The onboarding bundle — 100+ files — including:
🧑💻 SAP tools PowerPoint decks with walkthroughs
📸 Multiple screenshots per user action
🔍 Global SAP context (what it is, what it does)
🧭 High-level processes (workflows, modules)
Why NotebookLM failed

Using a fully automated tool like NotebookLM is often a good choice, and 80% of the time, it’s enough.
NotebookLM is incredible for many use cases — but not all. And we had one of those edge cases:
- Multiple screenshots were linked together to form a transaction. Answering a question meant retrieving all related transactions and screenshots.
- Transactions were basic actions, but there were higher-level flows — called processes — that combined multiple transactions.
- The screenshots were static, often outdated, with arrows and legends. Classic LLM-based OCR struggled to interpret them.
What we tried first
We took the LLM-smart route.
Describe each PowerPoint using an LLM, give the bot context, and feed the generated text into a RAG (retrieval-augmented generation) system. With a clever prompt, we hoped the LLM would work its magic.
It kinda worked.
But also — kinda didn’t.
Answers were vague or incomplete. Sometimes wrong. The bot missed key context, failed to connect steps, and couldn’t figure out which screenshot matched which action.
Why?
Because our content wasn’t a clean narrative or flat Q&A. Each action spanned multiple screenshots, mixed with global descriptions, overviews, workflows, and tiny image-driven tasks. No clear structure. No hierarchy.
The AI guessed. Poorly. 🫠
So we went boring

Instead of going deeper into “smart” territory — agentic flows, planning steps, tool-chaining — we backed up.
Simpler is often better.
We went full preprocessing:
- 👉 Extract the list of transactions and processes (Gemini 2.5 Pro)
- 👉 For each, generate a structured chunk (Gemini 2.5 Pro)
- 👉 Load them directly into the KB — clean and clear
No agents. No multimodal embeddings. No guessing. Just plain, well-done preprocessing.
Fast, old-school (relatively), and surprisingly powerful.
Step-by-step: how we actually built it
Stage 1: Finding the simple yet powerful idea
We combed through the docs and the client’s gold standard, discovering a small number of possible actions and higher-level processes.
If you don’t know what is a gold standard, check out this post ASAP
Some examples:
- “Create a sales order” (action)
- “Edit pricing fields” (action)
- “Print the inventory” (action)
- “What to do when X is late delivering” (process)
Each came with multiple screenshots, annotations, maybe even a flow chart.
We tested Gemini (via Google Workspace) by uploading some files and asking questions. We could retrieve all the info for a single transaction, but not for a process or multiple transactions together.
So we made a list of transactions, and for each, created a chunk that fully described it — small enough to fit under our embedding model’s 2,000-token limit (GCP text-embedding-005
).
Then we listed all the processes and described each, referencing the relevant transactions.
TADAAAAA ✨✨✨✨.
Stage 2: Finding an efficient way to implement it
Did we build some complex code with multi-level OCR analysis using powerful LLMs? Nope. We wanted something faster.
Enter the Gemini API.
We discovered (and honestly had never used before) that Gemini has an API where you can upload an entire file (PDF, for example) and use it directly in LLM processing.
Here’s how simple it is:
import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))
model = genai.GenerativeModel("gemini-2.5-pro-XXX")
uploaded = genai.upload_file(path=pdf_path, display_name="display_name")
response = model.generate_content(
[
uploaded,
prompt,
]
)
For the price of those few lines, you can process files up to 50 MB. Images, tables, and graphs are all handled — you can query them directly. Pretty powerful.
Then we generated our chunks and pushed them into our knowledge base.
The whole chunk-generation script was under 50 lines — and it outperformed NotebookLM.
Simple. Fast. Reliable 💪💪💪.
One Does Not Simply… ignore preprocessing 🧙♂️

I get it.
Building fancy agentic systems feels satisfying. Agents, tool calls, deep research, multimodal everything.
But “simple + reliable” beats “fancy + fuzzy” every time — especially when real users need exact guidance.
We didn’t need a magic pipeline.
We needed a map: question → correct transactions/processes → answer.
That’s exactly what preprocessing gave us.
When agents are worth it
I’m not anti-agent. I use them all the time. But you have to pick your battles.
Go agentic when:
- 🚀 Users ask open-ended, research-heavy questions
- 📚 Data is loose narrative or knowledge-dense
- 🔄 You need multi-hop processes that conditionally use data sources or APIs
Stick with structured when:
- ✅ Your content is procedural
- 📸 Users need ultra-reliable, instant answers
- 🧑🏫 Users need clear, step-by-step guidance
Conclusion
Don’t underestimate the basics.
Sometimes your smartest move… is not to be clever.
👉 Building a support bot with visual or procedural data? Start by analyzing the data and mapping transactions. Structure your chunks.
And if the fancy path calls to you — try it. But always benchmark against the boring baseline. Because → Sometimes Basic Beats Agentic 😎😎😎.
👉 If you enjoyed this article and want to read more about AI, MCP, and Multi-Agent systems, follow me here on Medium or connect with me directly on LinkedIn!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.