Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Sometimes Basic Beats Agentic
Latest   Machine Learning

Sometimes Basic Beats Agentic

Last Updated on September 4, 2025 by Editorial Team

Author(s): ravindu somawansa

Originally published on Towards AI.

Why “boring preprocessing” made our onboarding bot laser‑precise

Sometimes Basic Beats Agentic
Basic beats Agentic

Why this matters now

Everyone’s chasing AI agents, multimodal everything, and “let the system figure it out.”

But sometimes? That complexity caves in on itself.

Especially when you’re just trying to guide users through SAP tool onboarding with screenshots and step-by-step walkthroughs 🫥.

The context first

We were asked to create an internal chatbot to help new users navigate some of the SAP tools in our stack. These users go through initial training, and later, when they have questions, they can turn to this chatbot.

Of course, the “clients” first tried Google’s NotebookLM to see if they could do it themselves. But precision was low, so the task came to us.

Our data? The onboarding bundle — 100+ files — including:

🧑‍💻 SAP tools PowerPoint decks with walkthroughs
📸 Multiple screenshots per user action
🔍 Global SAP context (what it is, what it does)
🧭 High-level processes (workflows, modules)

Why NotebookLM failed

Why AI failed

Using a fully automated tool like NotebookLM is often a good choice, and 80% of the time, it’s enough.

NotebookLM is incredible for many use cases — but not all. And we had one of those edge cases:

  • Multiple screenshots were linked together to form a transaction. Answering a question meant retrieving all related transactions and screenshots.
  • Transactions were basic actions, but there were higher-level flows — called processes — that combined multiple transactions.
  • The screenshots were static, often outdated, with arrows and legends. Classic LLM-based OCR struggled to interpret them.

What we tried first

We took the LLM-smart route.

Describe each PowerPoint using an LLM, give the bot context, and feed the generated text into a RAG (retrieval-augmented generation) system. With a clever prompt, we hoped the LLM would work its magic.

It kinda worked.

But also — kinda didn’t.

Answers were vague or incomplete. Sometimes wrong. The bot missed key context, failed to connect steps, and couldn’t figure out which screenshot matched which action.

Why?

Because our content wasn’t a clean narrative or flat Q&A. Each action spanned multiple screenshots, mixed with global descriptions, overviews, workflows, and tiny image-driven tasks. No clear structure. No hierarchy.

The AI guessed. Poorly. 🫠

So we went boring

Boring is best

Instead of going deeper into “smart” territory — agentic flows, planning steps, tool-chaining — we backed up.

Simpler is often better.

We went full preprocessing:

  • 👉 Extract the list of transactions and processes (Gemini 2.5 Pro)
  • 👉 For each, generate a structured chunk (Gemini 2.5 Pro)
  • 👉 Load them directly into the KB — clean and clear

No agents. No multimodal embeddings. No guessing. Just plain, well-done preprocessing.

Fast, old-school (relatively), and surprisingly powerful.

Step-by-step: how we actually built it

Stage 1: Finding the simple yet powerful idea

We combed through the docs and the client’s gold standard, discovering a small number of possible actions and higher-level processes.

If you don’t know what is a gold standard, check out this post ASAP

Some examples:

  • “Create a sales order” (action)
  • “Edit pricing fields” (action)
  • “Print the inventory” (action)
  • “What to do when X is late delivering” (process)

Each came with multiple screenshots, annotations, maybe even a flow chart.

We tested Gemini (via Google Workspace) by uploading some files and asking questions. We could retrieve all the info for a single transaction, but not for a process or multiple transactions together.

So we made a list of transactions, and for each, created a chunk that fully described it — small enough to fit under our embedding model’s 2,000-token limit (GCP text-embedding-005).

Then we listed all the processes and described each, referencing the relevant transactions.

TADAAAAA ✨✨✨✨.

Stage 2: Finding an efficient way to implement it

Did we build some complex code with multi-level OCR analysis using powerful LLMs? Nope. We wanted something faster.

Enter the Gemini API.

We discovered (and honestly had never used before) that Gemini has an API where you can upload an entire file (PDF, for example) and use it directly in LLM processing.

Here’s how simple it is:

import os
import google.generativeai as genai
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-2.5-pro-XXX")
uploaded = genai.upload_file(path=pdf_path, display_name="display_name")
response = model.generate_content(
[
uploaded,
prompt,
]
)

For the price of those few lines, you can process files up to 50 MB. Images, tables, and graphs are all handled — you can query them directly. Pretty powerful.

Then we generated our chunks and pushed them into our knowledge base.

The whole chunk-generation script was under 50 lines — and it outperformed NotebookLM.

Simple. Fast. Reliable 💪💪💪.

One Does Not Simply… ignore preprocessing 🧙‍♂️

An angry magicien

I get it.

Building fancy agentic systems feels satisfying. Agents, tool calls, deep research, multimodal everything.

But “simple + reliable” beats “fancy + fuzzy” every time — especially when real users need exact guidance.

We didn’t need a magic pipeline.
We needed a map: question → correct transactions/processes → answer.

That’s exactly what preprocessing gave us.

When agents are worth it

I’m not anti-agent. I use them all the time. But you have to pick your battles.

Go agentic when:

  • 🚀 Users ask open-ended, research-heavy questions
  • 📚 Data is loose narrative or knowledge-dense
  • 🔄 You need multi-hop processes that conditionally use data sources or APIs

Stick with structured when:

  • ✅ Your content is procedural
  • 📸 Users need ultra-reliable, instant answers
  • 🧑‍🏫 Users need clear, step-by-step guidance

Conclusion

Don’t underestimate the basics.

Sometimes your smartest move… is not to be clever.

👉 Building a support bot with visual or procedural data? Start by analyzing the data and mapping transactions. Structure your chunks.

And if the fancy path calls to you — try it. But always benchmark against the boring baseline. Because → Sometimes Basic Beats Agentic 😎😎😎.

👉 If you enjoyed this article and want to read more about AI, MCP, and Multi-Agent systems, follow me here on Medium or connect with me directly on LinkedIn!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.