AI Agents in Enterprise: A Journey Into AI-Generated Hell

Author(s): ravindu somawansa

Originally published on Towards AI.

AI Agents in Enterprise: A Journey Into AI-Generated Hell

When I first got asked to build AI agents at my company, I thought most of my time would be about designing and coding the best agents possible. Instead, I got never-ending meetings, impossible expectations, and endless debates about workflow and organization.

Two years later, with lots of AI agents and painful experience behind me, here are some learnings for you.

Welcome to my journey through AI-generated hell.

The Illusions of Rapid Products

Using all available frameworks (MCP, A2A, LangChain, LlamaIndex, CrewAI…), AI providers (OpenAI, Anthropic, Mistral, Groq…), and cloud providers (GCP, AWS, Azure…), it is extremely easy to create a Proof of Concept (PoC). If you know what you’re doing, you can literally build a Multi-Agent AI system or a RAG app in 5 minutes and deploy it on the cloud in 10.

But that does not mean you have a product. You just have something to make nice demos.

To have a product, you need:

User Management: You need to integrate with your company’s authentication system (Okta, Azure AD…) to manage users and permissions. There’s no skipping one-button auth.
Security & Monitoring: Your product must follow proper security guidelines (pentests…), handle personal information (PII) correctly, and monitor everything happening. Even if you think you don’t need it, you will.
Scalability & Pricing: The product must handle all potential users, whether hundreds, thousands, or millions. That means lots of load tests. You’ll constantly be asked, “How much will it cost?” until it haunts your dreams.
User Feedback: Your product needs to record all user feedback, because you always need to listen to users’ complaints.
Support: You need to handle new users, clearly specify responsibilities, and set SLAs (Yes, we hate that!).
Onboarding: Ensure your users know how to use your tool. Trust me, they know less than you imagine.

All of this will take significantly longer (10x, 20x, or even 100x) than the PoC.And these are just the basics any product should have.

Now let’s dive into specifics for AI products.

The Shackles of Evaluation

When using public AI tools like ChatGPT, have you ever wondered if the answers were wrong? No, you just assume it works, right? This is normal, but it can be disastrous when naive users ask critical questions, assuming your product perfectly understands the company’s data and business model.

That’s why you need an essential component: evaluation.

Evaluation measures different metrics of the product. Many metrics exist, but ultimately, the goal is measuring answer quality. For accurate evaluation, you need two things:

Gold Standard: A set of questions and answers validated by business owners, covering all use cases. It can essentially be your bot’s specification. More questions mean better evaluation, and this standard should evolve with new documents and requirements.
Evaluation Metrics: These metrics validate your product’s answers. Many exist, depending on usage, such as LLM-as-Judge (using an LLM to assess correctness), answer relevance, faithfulness (ensuring no hallucinated facts), and more. (You can find a good list online.)

Evaluation itself isn’t difficult, but convincing business owners why it’s necessary and generating the gold standard can be extremely painful (weeks of tears sometimes).

Moreover, business owners must own this evaluation and gold standard, as only they can update it to ensure any changes to the data corpus function correctly.

The Constant Wrestling with Uncertainty

Imagine a new project to build AI agents (RAG or otherwise). You must analyze needs, propose an architecture, and provide a quote.

But how do you ensure your approach and quotation are accurate? GenAI isn’t mature enough yet, meaning there’s insufficient experience. A use case might seem similar, but differing data, additional languages, or minor use-case differences can break everything. You’ll build something, realize it doesn’t work, and have to rebuild it completely.

This isn’t a problem for engineers, but Project Managers, Product Owners, or PMs hate uncertainty. They want a clear quotation and timeline (preferably short) and want to stick to it. But AI inherently has tons of uncertainty.

One solution is short experiments (or spikes) early in the project to test and validate uncertainties. At the end of these spikes, you usually know if something works or needs more time.

The Weight of Project Dynamics

To manage AI projects, new roles and processes have emerged.

We discussed evaluation and uncertainty handling, but these must integrate into your project dynamics. This means explaining new processes not just to the team but to business owners and even clients.

Regarding new roles, many appeared, often with “AI” in their titles. Some are temporary/hyped (like “Prompt Engineer”), while others will stay. One critical role is the AI Engineer.

This person implements the AI parts into products. They must understand how LLMs really work (beyond theory), RAG, Agentic, and be experienced with various frameworks, AI, and cloud providers. They must know AI tools and their limits. You don’t wake up one morning as an AI Engineer; it requires self-learning, hands-on implementation, and constant curiosity since AI evolves quickly.

But you have numerous roles: full-stack developer, DevOps, data engineer, ML engineer, data scientist, and now AI engineer. Determining role boundaries (often more person-dependent than role-dependent) and allocating workloads correctly is essential.

All these new roles and processes highlight the core problem with AI products: the real pain isn’t the technology, but organizational change (as usual).

Conclusion

After two years navigating this AI-generated hell, I can confidently say building AI agents for enterprise isn’t just technically challenging — it’s a multi-front battle. From rapid PoC illusions, tedious yet crucial evaluations, wrestling with architectural uncertainty, to shifting organizational dynamics — nothing is simple.

If you’re starting this journey, prepare yourself. Remember, the real challenge isn’t coding AI agents — it’s handling people, from your team’s demands, business owners’ expectations, to client needs.

In short, building enterprise-ready AI products is as much about human challenges as it is about AI.

Welcome to AI-generated hell!

👉 If you enjoyed this article and want to read more about AI, reasoning models, and Multi-Agent systems, follow me here on Medium or connect with me directly on LinkedIn!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

AI Agents in Enterprise: A Journey Into AI-Generated Hell

Author(s): ravindu somawansa

The Illusions of Rapid Products

The Shackles of Evaluation

The Constant Wrestling with Uncertainty

The Weight of Project Dynamics

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

AI Agents in Enterprise: A Journey Into AI-Generated Hell

Author(s): ravindu somawansa

The Illusions of Rapid Products

The Shackles of Evaluation

The Constant Wrestling with Uncertainty

The Weight of Project Dynamics

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement