Sometimes Basic Beats Agentic

Last Updated on September 4, 2025 by Editorial Team

Author(s): ravindu somawansa

Originally published on Towards AI.

Why “boring preprocessing” made our onboarding bot laser‑precise

Sometimes Basic Beats Agentic — Basic beats Agentic

Why this matters now

Everyone’s chasing AI agents, multimodal everything, and “let the system figure it out.”

But sometimes? That complexity caves in on itself.

Especially when you’re just trying to guide users through SAP tool onboarding with screenshots and step-by-step walkthroughs 🫥.

The context first

We were asked to create an internal chatbot to help new users navigate some of the SAP tools in our stack. These users go through initial training, and later, when they have questions, they can turn to this chatbot.

Of course, the “clients” first tried Google’s NotebookLM to see if they could do it themselves. But precision was low, so the task came to us.

Our data? The onboarding bundle — 100+ files — including:

🧑‍💻 SAP tools PowerPoint decks with walkthroughs
📸 Multiple screenshots per user action
🔍 Global SAP context (what it is, what it does)
🧭 High-level processes (workflows, modules)

Why NotebookLM failed

Using a fully automated tool like NotebookLM is often a good choice, and 80% of the time, it’s enough.

NotebookLM is incredible for many use cases — but not all. And we had one of those edge cases:

Multiple screenshots were linked together to form a transaction. Answering a question meant retrieving all related transactions and screenshots.
Transactions were basic actions, but there were higher-level flows — called processes — that combined multiple transactions.
The screenshots were static, often outdated, with arrows and legends. Classic LLM-based OCR struggled to interpret them.

What we tried first

We took the LLM-smart route.

Describe each PowerPoint using an LLM, give the bot context, and feed the generated text into a RAG (retrieval-augmented generation) system. With a clever prompt, we hoped the LLM would work its magic.

It kinda worked.

But also — kinda didn’t.

Answers were vague or incomplete. Sometimes wrong. The bot missed key context, failed to connect steps, and couldn’t figure out which screenshot matched which action.

Why?

Because our content wasn’t a clean narrative or flat Q&A. Each action spanned multiple screenshots, mixed with global descriptions, overviews, workflows, and tiny image-driven tasks. No clear structure. No hierarchy.

The AI guessed. Poorly. 🫠

So we went boring

Instead of going deeper into “smart” territory — agentic flows, planning steps, tool-chaining — we backed up.

Simpler is often better.

We went full preprocessing:

👉 Extract the list of transactions and processes (Gemini 2.5 Pro)
👉 For each, generate a structured chunk (Gemini 2.5 Pro)
👉 Load them directly into the KB — clean and clear

No agents. No multimodal embeddings. No guessing. Just plain, well-done preprocessing.

Fast, old-school (relatively), and surprisingly powerful.

Step-by-step: how we actually built it

Stage 1: Finding the simple yet powerful idea

We combed through the docs and the client’s gold standard, discovering a small number of possible actions and higher-level processes.

If you don’t know what is a gold standard, check out this post ASAP

Some examples:

“Create a sales order” (action)
“Edit pricing fields” (action)
“Print the inventory” (action)
“What to do when X is late delivering” (process)

Each came with multiple screenshots, annotations, maybe even a flow chart.

We tested Gemini (via Google Workspace) by uploading some files and asking questions. We could retrieve all the info for a single transaction, but not for a process or multiple transactions together.

So we made a list of transactions, and for each, created a chunk that fully described it — small enough to fit under our embedding model’s 2,000-token limit (GCP text-embedding-005).

Then we listed all the processes and described each, referencing the relevant transactions.

TADAAAAA ✨✨✨✨.

Stage 2: Finding an efficient way to implement it

Did we build some complex code with multi-level OCR analysis using powerful LLMs? Nope. We wanted something faster.

Enter the Gemini API.

We discovered (and honestly had never used before) that Gemini has an API where you can upload an entire file (PDF, for example) and use it directly in LLM processing.

Here’s how simple it is:

import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-2.5-pro-XXX")
uploaded = genai.upload_file(path=pdf_path, display_name="display_name")
response = model.generate_content(
 [
 uploaded,
 prompt,
 ]
)

For the price of those few lines, you can process files up to 50 MB. Images, tables, and graphs are all handled — you can query them directly. Pretty powerful.

Then we generated our chunks and pushed them into our knowledge base.

The whole chunk-generation script was under 50 lines — and it outperformed NotebookLM.

Simple. Fast. Reliable 💪💪💪.

One Does Not Simply… ignore preprocessing 🧙‍♂️

I get it.

Building fancy agentic systems feels satisfying. Agents, tool calls, deep research, multimodal everything.

But “simple + reliable” beats “fancy + fuzzy” every time — especially when real users need exact guidance.

We didn’t need a magic pipeline.
We needed a map: question → correct transactions/processes → answer.

That’s exactly what preprocessing gave us.

When agents are worth it

I’m not anti-agent. I use them all the time. But you have to pick your battles.

Go agentic when:

🚀 Users ask open-ended, research-heavy questions
📚 Data is loose narrative or knowledge-dense
🔄 You need multi-hop processes that conditionally use data sources or APIs

Stick with structured when:

✅ Your content is procedural
📸 Users need ultra-reliable, instant answers
🧑‍🏫 Users need clear, step-by-step guidance

Conclusion

Don’t underestimate the basics.

Sometimes your smartest move… is not to be clever.

👉 Building a support bot with visual or procedural data? Start by analyzing the data and mapping transactions. Structure your chunks.

And if the fancy path calls to you — try it. But always benchmark against the boring baseline. Because → Sometimes Basic Beats Agentic 😎😎😎.

👉 If you enjoyed this article and want to read more about AI, MCP, and Multi-Agent systems, follow me here on Medium or connect with me directly on LinkedIn!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Sometimes Basic Beats Agentic

Author(s): ravindu somawansa

Why “boring preprocessing” made our onboarding bot laser‑precise

Why this matters now

The context first

Why NotebookLM failed

What we tried first

So we went boring

Step-by-step: how we actually built it

Stage 1: Finding the simple yet powerful idea

Stage 2: Finding an efficient way to implement it

One Does Not Simply… ignore preprocessing 🧙‍♂️

When agents are worth it

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Sometimes Basic Beats Agentic

Author(s): ravindu somawansa

Why “boring preprocessing” made our onboarding bot laser‑precise

Why this matters now

The context first

Why NotebookLM failed

What we tried first

So we went boring

Step-by-step: how we actually built it

Stage 1: Finding the simple yet powerful idea

Stage 2: Finding an efficient way to implement it

One Does Not Simply… ignore preprocessing 🧙‍♂️

When agents are worth it

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement