AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence

Author(s): Petros Demetrakopoulos

Originally published on Towards AI.

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence — Photo by Growtika on Unsplash

Let’s face it. Gen AI and LLMs have changed forever the way we develop software and write code. Morever, the recent developments in AI have not only changed the way we write code, they have also shifted where computation happens. As a matter of fact it is the 2nd major shift that I experience in my career, with the 1st one being the shift from on-premise infrastructure to cloud that started around the beginning of 2010s. Now we are witnessing a similar shift, from cloud-based computing to LLM/Gen-AI-driven computation.

However, this shift and the ability to easily solve increasingly complex problems that accompanies it, comes to a cost. As AI adoption accelerates across organizations, LLM costs are becoming a significant factor of engineering budgets. While models like GPT-4 and Claude are incredibly powerful, they’re also expensive, especially when they need to scale. As with any development in the tech sector, the solution is not to avoid or be afraid of the new technologies (like LLMs) altogether, but embrace them. However we need to use them strategically and take as much as we can from them at a reasonable cost.

The The AI Waterfall framework

The “AI Waterfall” is a hierarchical problem-solving strategy where we attempt to solve tasks using the cheapest, fastest methods first, only escalating to more expensive AI models when simpler approaches fail. In this way we can take advantage of separate tiers of intelligence and only pay the price for the computational complexity that we actually need to solve the problem. Think of it as a series of gates: each task flows through progressively more sophisticated (and costly) solutions until it finds one that solves the problem. The key insight and operation principle is that many problems that seem to require advanced AI and costly models can actually be solved with traditional programming techniques, basic machine learning, or lightweight models at a fraction of the cost and latency.

The economic motivation

Before diving into techniques, let’s establish the economic motivation: In the following bullets the costs of the more advanced LLMs are presented in comparison to the cost of simpler methods.

GPT-4: ~$0.03 per 1K input tokens, ~$0.06 per 1K output tokens
Claude Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
Regex/Rule-based systems: Effectively $0 (just minor compute time)
Classical ML models: A few pennies per thousand predictions

It is easy to understand that when processing thousands or millions of requests, these costs compound dramatically.

Building an AI Waterfall

Let’s see some simple examples of problems that can be solved by complex LLMs but can also be solved very effectively by significantly simpler methods.

Example: Email classification

Assuming we want to classify emails according to the department of the company that need to be redirected to. Instead of sending all the e-mail bodies to an expensive LLM, the simpler RegEx-based approach shown below can probably handle 60–80% of the inbound e-mails and only escalate to a costly LLM the rest which may be the more ambiguous ones and thus more difficult to classify.

Now think that between the regex rules and the LLM we can include more “tiers of intelligence” such as one-shot learning classifiers, simpler ML models, pre-trained embedding models etc.

Another example could be the handling of a customer query that may have been received through a customer support chat. An approach following the “AI Waterfall” philosophy would look like this:

The AI Waterfall approach is also extended to the choice between LLMs themselves. If we have a task that needs some reasoning but not the most advanced kind of it, we first try to pass it through the simplest (and cheapest) LLM, say GPT-3.5 before forwarding it to the more advance (and expensive) GPT-4.

Build incrementally

A good practice is to measure the actual LLM cost (in $ terms) or usage (in time, cases or tokens terms) before optimizing it using the AI waterfall strategy. Then, we can start with the highest-volume, most expensive use cases, implement 1st tier solutions (RegEx, custom rules, access to databases etc.) and measure the reduction in LLM costs. Moving on, we can increase more tiers of increasingly complex and costly solutions and keep track of the cost reduction. In this way we can incrementally develop robust solutions.

The final step is the continuous monitoring of the system. In this way we can gain insights into which cases escalate frequently to the more expensive models and try to fine-tune the cheapest methods (include new rules, further enhance databases etc.)

Pitfalls to avoid

After working for many months with this architecture, here are some pitfalls that you should avoid.

Avoid over-engineering early tiers: This practically means that you should not spent weeks in creating extremely complex RegEx rules, or creating very large databases to account for almost every case.
Never ignore edge cases: Edge cases always exist and need to be handled. In the majority of cases, they will be better handled by an expensive LLM, so do not omit it because it seems costly.
Avoid using static confidence thresholds: Thresholds always need to be dynamically adjusted based on the needs of each project.
Avoid premature optimization: Always start with high-impact, high-volume use cases and then move to more rare cases.

Conclusion

The AI Waterfall is a philosophy ensuring that advanced AI is used in an intelligent and cost-effective way. It is a framework helping solution architects and software engineers only pay for advanced AI services when they truly needed. It is a point in time that we have to go back to fundamentals and recall that effective engineering is not about having access to the most powerful tools, but it is about building systems that know when to use them. Because at the end of the day, the best AI solution is often the simplest one that works.

If you liked this article you want to keep up to date with new articles and content follow me on LinkedIn.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence

Author(s): Petros Demetrakopoulos

The The AI Waterfall framework

The economic motivation

Building an AI Waterfall

Build incrementally

Pitfalls to avoid

Conclusion

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence

Author(s): Petros Demetrakopoulos

The The AI Waterfall framework

The economic motivation

Building an AI Waterfall

Build incrementally

Pitfalls to avoid

Conclusion

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement