Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Latest   Machine Learning

Another Few Tips for Better Results from LLM RAG Solutions

Author(s): Dmitry Malishev

Originally published on Towards AI.

A structured and thoughtful response from LLM based on perfectly selected data from RAG is a promising technique. And it can be even better!

Image generated by the author using Leonardo.ai

By now, I’ve completed several projects related to LLM based on RAG: article analyzers, knowledge-based Q&A bots, and insight generators. While working with it I had a chance to try many frameworks, models, and pipelines with an uncountable number of parameters, settings, and other tunings capable of turning these pieces of code into a real magic spell. This article is a compilation of my notes on building a better-performing LLM RAG pipeline in the hope of helping other engineers with experiments.

Before digging into the details, let me recall a general LLM RAG pipeline:

Image created by the author using Draw.io

As shown in the diagram, we deal with consequential steps to turn a user query into a LLM response. In a real project, any step can contain additional pre- and post-processing routines.

All steps are currently covered by well-known third-party frameworks. For the RAG part, it’s either LlamaIndex or LangChain. For the LLM part, it’s typically a local setup based on open Llama architecture or one of the cloud-based GPT services like ChatGPT. You may find dozens of additional details just by googling any of them.

I’m not going into detail on how the pipeline works, as I’m targeting a more experienced audience. But if you’re only starting, I recommend checking out LlamaIndex and LangChain beginner’s guides. The basic workflows are easy to understand. There’s no need for deep knowledge of large language models and data processing algorithms.

All preparations are completed and it’s time to get straight to the tips!

Context Length

This parameter relates to both LLM and RAG parts, it defines the final length of the prompt containing a query and a context. Often it sets up as a maximum context length supported by the used LLM with the intuition β€œthe bigger is the better”. Meanwhile, I found several side effects of the big context value:

  • If the required information is contained only in a contiguous location of the knowledge base you may end up with many irrelevant chunks provided by the RAG, which may trigger LLM to wrong answers.
  • There’s the so-called Lost-in-the-Middle problem, where LLM pays attention to the chunks only at the beginning and at the end of the context and skips the middle.
  • Most of the time a longer context means more VRAM demand for the pipeline and, consequently, more calculations and processing time.

And one last argument! There are LLMs with really big context sizes (64k, 128k, …) which may feed all the knowledge database and make the RAG part entirely unnecessary. It might be interesting to experiment and may work out in certain cases, but as we still have RAG frameworks on the scene, big context doesn’t solve all the problems.

Text Cleaning

Depending on the task you may decide to clean the context from all unnecessary symbols (extra spaces, newlines, etc.). It sounds especially appealing when the knowledge database is extracted from text formats like DOCX and PDF. Be careful at this step, as I discovered that some LLMs pay lots of attention to the information dividing into sections, paragraphs, and bulleted lists, and perform much worse without these special symbols, which, by the way, don’t spend too many bytes to store.

Two Parts β€” Two Queries

By default, an input query applies to RAG and LLM parts unaltered: vector searching, composing a prompt, and sending it to an LLM. These parts are so different at the job they do, but we expect it’s good enough to feed them with the same input. And this is not mandatory! It’s possible to get optimized query modifications and provide them to each part of the pipeline. This is particularly useful when you work with fixed queries to find certain types of answers in the knowledge database. So, you can create two versions of the prompt manually whenever possible or use an LLM to compose modified versions.

Other Languages

The vast majority of LLMs are trained on English text data. Other languages are added to the training set too, but usually in smaller volumes. Trained LLMs can also be finetuned for other languages, but it’s hard. Anyway, when English isn’t the basic language for your solution, you may find yourself with a small number of options. For this case, I want to share another trick! LLMs are almost always much better at understanding non-English languages than using them for composing responses. Just try to use your knowledge database in the local language, but ask an LLM to provide the response in English. This may be enough by itself if English answers are appropriate, if not β€” use a language translation service to turn it into a local language.

Faster on the Same Host

LLM-based solutions work rather slowly. Having both a powerful CPU and GPU, you still want to make it faster without digging into low-level algorithmic optimization. I have a little something for this case, too! Just try to build your app for an alternative OS and launch it in a container (Docker). Once, I got a 20% performance boost by launching my pipeline for Linux in a container on Windows versus the initial Windows-native. I assume frameworks may have different optimization sets for different platforms and OSs, which is also the case for GPU drivers and low-level stacks. This advice is so easy to try: all needed frameworks are cross-platform and Docker provides ready-made images with GPU support.

That’s it for now.
I hope your list of ideas to test is now bigger. I’ll be glad to receive feedback and let’s keep in touch!

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓