Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence
Artificial Intelligence   Latest   Machine Learning

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence

Author(s): Petros Demetrakopoulos

Originally published on Towards AI.

AI Waterfall: How to Spend Less Money on LLMs Using Tiered Intelligence
Photo by Growtika on Unsplash

Let’s face it. Gen AI and LLMs have changed forever the way we develop software and write code. Morever, the recent developments in AI have not only changed the way we write code, they have also shifted where computation happens. As a matter of fact it is the 2nd major shift that I experience in my career, with the 1st one being the shift from on-premise infrastructure to cloud that started around the beginning of 2010s. Now we are witnessing a similar shift, from cloud-based computing to LLM/Gen-AI-driven computation.

However, this shift and the ability to easily solve increasingly complex problems that accompanies it, comes to a cost. As AI adoption accelerates across organizations, LLM costs are becoming a significant factor of engineering budgets. While models like GPT-4 and Claude are incredibly powerful, they’re also expensive, especially when they need to scale. As with any development in the tech sector, the solution is not to avoid or be afraid of the new technologies (like LLMs) altogether, but embrace them. However we need to use them strategically and take as much as we can from them at a reasonable cost.

The The AI Waterfall framework

The “AI Waterfall” is a hierarchical problem-solving strategy where we attempt to solve tasks using the cheapest, fastest methods first, only escalating to more expensive AI models when simpler approaches fail. In this way we can take advantage of separate tiers of intelligence and only pay the price for the computational complexity that we actually need to solve the problem. Think of it as a series of gates: each task flows through progressively more sophisticated (and costly) solutions until it finds one that solves the problem. The key insight and operation principle is that many problems that seem to require advanced AI and costly models can actually be solved with traditional programming techniques, basic machine learning, or lightweight models at a fraction of the cost and latency.

The economic motivation

Before diving into techniques, let’s establish the economic motivation: In the following bullets the costs of the more advanced LLMs are presented in comparison to the cost of simpler methods.

  • GPT-4: ~$0.03 per 1K input tokens, ~$0.06 per 1K output tokens
  • Claude Sonnet: ~$0.003 per 1K input tokens, ~$0.015 per 1K output tokens
  • Regex/Rule-based systems: Effectively $0 (just minor compute time)
  • Classical ML models: A few pennies per thousand predictions

It is easy to understand that when processing thousands or millions of requests, these costs compound dramatically.

Building an AI Waterfall

Let’s see some simple examples of problems that can be solved by complex LLMs but can also be solved very effectively by significantly simpler methods.

Example: Email classification

Assuming we want to classify emails according to the department of the company that need to be redirected to. Instead of sending all the e-mail bodies to an expensive LLM, the simpler RegEx-based approach shown below can probably handle 60–80% of the inbound e-mails and only escalate to a costly LLM the rest which may be the more ambiguous ones and thus more difficult to classify.

Now think that between the regex rules and the LLM we can include more “tiers of intelligence” such as one-shot learning classifiers, simpler ML models, pre-trained embedding models etc.

Another example could be the handling of a customer query that may have been received through a customer support chat. An approach following the “AI Waterfall” philosophy would look like this:

The AI Waterfall approach is also extended to the choice between LLMs themselves. If we have a task that needs some reasoning but not the most advanced kind of it, we first try to pass it through the simplest (and cheapest) LLM, say GPT-3.5 before forwarding it to the more advance (and expensive) GPT-4.

Build incrementally

A good practice is to measure the actual LLM cost (in $ terms) or usage (in time, cases or tokens terms) before optimizing it using the AI waterfall strategy. Then, we can start with the highest-volume, most expensive use cases, implement 1st tier solutions (RegEx, custom rules, access to databases etc.) and measure the reduction in LLM costs. Moving on, we can increase more tiers of increasingly complex and costly solutions and keep track of the cost reduction. In this way we can incrementally develop robust solutions.

The final step is the continuous monitoring of the system. In this way we can gain insights into which cases escalate frequently to the more expensive models and try to fine-tune the cheapest methods (include new rules, further enhance databases etc.)

Pitfalls to avoid

After working for many months with this architecture, here are some pitfalls that you should avoid.

  1. Avoid over-engineering early tiers: This practically means that you should not spent weeks in creating extremely complex RegEx rules, or creating very large databases to account for almost every case.
  2. Never ignore edge cases: Edge cases always exist and need to be handled. In the majority of cases, they will be better handled by an expensive LLM, so do not omit it because it seems costly.
  3. Avoid using static confidence thresholds: Thresholds always need to be dynamically adjusted based on the needs of each project.
  4. Avoid premature optimization: Always start with high-impact, high-volume use cases and then move to more rare cases.

Conclusion

The AI Waterfall is a philosophy ensuring that advanced AI is used in an intelligent and cost-effective way. It is a framework helping solution architects and software engineers only pay for advanced AI services when they truly needed. It is a point in time that we have to go back to fundamentals and recall that effective engineering is not about having access to the most powerful tools, but it is about building systems that know when to use them. Because at the end of the day, the best AI solution is often the simplest one that works.

If you liked this article you want to keep up to date with new articles and content follow me on LinkedIn.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.