AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence
Author(s): Petros Demetrakopoulos
Originally published on Towards AI.
Let’s face it. Gen AI and LLMs have changed forever the way we develop software and write code. Morever, the recent developments in AI have not only changed the way we write code, they have also shifted where computation happens. As a matter of fact it is the 2nd major shift that I experience in my career, with the 1st one being the shift from on-premise infrastructure to cloud that started around the beginning of 2010s. Now we are witnessing a similar shift, from cloud-based computing to LLM/Gen-AI-driven computation.
However, this shift and the ability to easily solve increasingly complex problems that accompanies it, comes to a cost. As AI adoption accelerates across organizations, LLM costs are becoming a significant factor of engineering budgets. While models like GPT-4 and Claude are incredibly powerful, they’re also expensive, especially when they need to scale. As with any development in the tech sector, the solution is not to avoid or be afraid of the new technologies (like LLMs) altogether, but embrace them. However we need to use them strategically and take as much as we can from them at a reasonable cost.
The The AI Waterfall framework
The “AI Waterfall” is a hierarchical problem-solving strategy where we attempt to solve tasks using the cheapest, fastest methods first, only escalating to more expensive AI models when simpler approaches fail. In this way we can take advantage of separate tiers of intelligence and only pay the price for the computational complexity that we actually need to solve the problem. Think of it as a series of gates: each task flows through progressively more sophisticated (and costly) solutions until it finds one that solves the problem. The key insight and operation principle is that many problems that seem to require advanced AI and costly models can actually be solved with traditional programming techniques, basic machine learning, or lightweight models at a fraction of the cost and latency.
The economic motivation
Before diving into techniques, let’s establish the economic motivation: In the following bullets the costs of the more advanced LLMs are presented in comparison to the cost of simpler methods.
- GPT-4: ~$0.03 per 1K input tokens, ~$0.06 per 1K output tokens
- Claude Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
- Regex/Rule-based systems: Effectively $0 (just minor compute time)
- Classical ML models: A few pennies per thousand predictions
It is easy to understand that when processing thousands or millions of requests, these costs compound dramatically.
Building an AI Waterfall
Let’s see some simple examples of problems that can be solved by complex LLMs but can also be solved very effectively by significantly simpler methods.
Example: Email classification
Assuming we want to classify emails according to the department of the company that need to be redirected to. Instead of sending all the e-mail bodies to an expensive LLM, the simpler RegEx-based approach shown below can probably handle 60–80% of the inbound e-mails and only escalate to a costly LLM the rest which may be the more ambiguous ones and thus more difficult to classify.
Now think that between the regex rules and the LLM we can include more “tiers of intelligence” such as one-shot learning classifiers, simpler ML models, pre-trained embedding models etc.
Another example could be the handling of a customer query that may have been received through a customer support chat. An approach following the “AI Waterfall” philosophy would look like this:
The AI Waterfall approach is also extended to the choice between LLMs themselves. If we have a task that needs some reasoning but not the most advanced kind of it, we first try to pass it through the simplest (and cheapest) LLM, say GPT-3.5 before forwarding it to the more advance (and expensive) GPT-4.
Build incrementally
A good practice is to measure the actual LLM cost (in $ terms) or usage (in time, cases or tokens terms) before optimizing it using the AI waterfall strategy. Then, we can start with the highest-volume, most expensive use cases, implement 1st tier solutions (RegEx, custom rules, access to databases etc.) and measure the reduction in LLM costs. Moving on, we can increase more tiers of increasingly complex and costly solutions and keep track of the cost reduction. In this way we can incrementally develop robust solutions.
The final step is the continuous monitoring of the system. In this way we can gain insights into which cases escalate frequently to the more expensive models and try to fine-tune the cheapest methods (include new rules, further enhance databases etc.)
Pitfalls to avoid
After working for many months with this architecture, here are some pitfalls that you should avoid.
- Avoid over-engineering early tiers: This practically means that you should not spent weeks in creating extremely complex RegEx rules, or creating very large databases to account for almost every case.
- Never ignore edge cases: Edge cases always exist and need to be handled. In the majority of cases, they will be better handled by an expensive LLM, so do not omit it because it seems costly.
- Avoid using static confidence thresholds: Thresholds always need to be dynamically adjusted based on the needs of each project.
- Avoid premature optimization: Always start with high-impact, high-volume use cases and then move to more rare cases.
Conclusion
The AI Waterfall is a philosophy ensuring that advanced AI is used in an intelligent and cost-effective way. It is a framework helping solution architects and software engineers only pay for advanced AI services when they truly needed. It is a point in time that we have to go back to fundamentals and recall that effective engineering is not about having access to the most powerful tools, but it is about building systems that know when to use them. Because at the end of the day, the best AI solution is often the simplest one that works.
If you liked this article you want to keep up to date with new articles and content follow me on LinkedIn.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.