LightRAG and GraphRAG: The New Area of RAG Applications

Last Updated on December 25, 2024 by Editorial Team

Author(s): Narges

Originally published on Towards AI.

By now we all heard of RAG and how it has become a cornerstone in development of AI systems.

While this technology is quite new, it didn’t take long for technologist to get to its limitation and try to come up with new solutions.

Two projects that have caught my eye recently in the world of RAG systems are LightRAG from HKUDS and GraphRAG from Microsoft. If you’ve been curious about building more reliable, context-rich RAG applications, these libraries might be exactly what you need.

This article is a quick walk-through of these two projects, their advantages, limitation and evaluation based on original papers.

Traditional RAG: Strengths and Limitations

RAG as we know it by now operates in two phases; Indexing phase and Query phase.

In the indexing phase, documents are divided into chunks, each chunk is then converted into a vector embedding using an embedding model like BERT. Then the embedded models are stored in a vector database (e.g. Pinecone or FAISS). In the query phase when a user asks a question, it will transform into a vector. A similarity search is performed on the vector database to retrieve relevant chunks. The retrieved chunks and the query together are passed to a generative language model to produce a response.

This system comes with its own limitation. firstly, they fail to capture connection between fragmented information, spread across multiple documents making it challenging to outline comprehensive insights. Secondly they don’t scale efficiently as data grows, resulting in poor retrieval quality.

GraphRAG by Microsoft

GraphRAGs are a marriage of knowledge graphs and RAG systems. When users ask Query-Focused Summarization (QFS) questions where query context matters, Baseline RAG often struggles to provide a comprehensive answer. GraphRAG however, captures a broader view of information to draw meaningful insights by connecting relevant information from different chunks and documents together and capturing a broader view of information.

By using knowledge graphs, the relationships between entities and objects within the data is preserved, offering a richer representation of information.

How GraphRAG Works

Similar to to RAGs, GraphRAG piepline involves indexing and querying. However in the indexing phase, GraphRAG extracts entities and relationships from the document through crafted prompts and gleaning checks.

By using an LLM, GraphRAG identifies entities (Names, categories, and descriptions) and relationships (Connections between entities, along with strength scores (e.g., 1–10)).

In the next step through a semantic clustering, closely related nodes are grouped into hierarchical clusters called communities.

In the querying phase, when users start asking question, entities and relationships are identified and being compared to graph index and the most relevant communities are identified. Then these community summaries are passed to LLM for an intermediate response at different levels of local and global with a helpfulness score of 0 to 100. The final global answer is generated by passing the intermediate response passed by their helpfulness to a multi-stage map reduce approach.

Advantages of GraphRAG

Because of ability to capture relationship between data entities, GraphRAGs are ideal for answering complex queries. This is called enhanced contextual understanding and usually the main reason knowledge graphs are being used in addition of RAG systems with flat data retrieval capability.

The other benefit is improving accuracy and response quality compare to RAG systems, sometimes up to 3X! The global and local summarization makes GraphRAGs able to align with user intent better compare than RAG systems.

Limitations of GraphRAG

From a limitation standpoint the high computational cost is the main disadvantage of GraphRAG. The multiple API calls to construct and query the Knowledge Graph is slow, has the potential of hitting the rate limit as well as becoming extremely costly. (For instance, processing a book like A Christmas Carol (~32,000 words) using GPT-4o can cost $6–$7).

The other disadvantage is in order to incorporate new data into existing graph indexes, we will need to reconstruct the entire KG for previous data as well, which is an inefficient approach.

Source: https://learnopencv.com/lightrag/

LightRAG: A Lighter Alternative

LightRAG is an open source project initiated in Oct 2024 from HKUDS. LightRAG addresses GraphRAG inefficiencies by introducing deduplication mechanisms, dual level retreival and a better chunking mechanism to reduce computational overhead.

How LightRAG Works

Similar to GraphRAG, LightRAG extracts entities and relationships. It creates entity key-value pairs and relationship key-value pairs. These KV data structures are more precise than RAG or inefficient chunk traversal techniques in GraphRAG.

A deduplication step removes redundant nodes and relationships, compressing the knowledge graph. This deduplication process results in reduction of overhead and overall graph size.

In the retrieval phase of the process, queries are processed using dual-level retrieval. In the low-level retrieval the focus is on immediate neighbors of entities for precise, detailed answers. In the high-level retrieval the focus is on global relationships to answer broader, thematic queries. For a given query, LightRAG extracts both local and global query keywords using a vector similarity.

Advantages of LightRAG

LightRAG significantly reduces the number of API calls and tokens processed. For instance, creating a graph for a document may cost $0.15 compared to $4 with GraphRAG.

The other advantage of LightRAG is Unlike GraphRAG, LightRAG allows incremental updates to the knowledge graph without the need to rebuilding it entirely.

And lastly LightRAG achieves similar or better performance in benchmarks compared to GraphRAG in most scenarios, as you can see in the table attached from the table at the end of the article.

Benchmarks Discussion from the Paper

In the LightRAG paper, all experiments were conducted with GPT-4o-mini, using a chunk size of 1200 tokens for uniformity. Benchmarking approach mirrors the methodology in the GraphRAG paper, where an LLM evaluates and scores responses based on local or global queries.

Below is the evaluation table from LightRAG paper. LightRAG outperforms NaiveRAG, GraphRAG, HyDE and RQ-RAG significantly across all evaluation dimensions (Comprehensiveness, Diversity, Empowerment, and Overall) expect for the mix dataset where GraphRAG performs slightly better in this dimension (50.4% for GraphRAG vs. 49.6% for LightRAG).

Source: https://arxiv.org/abs/2410.05779

I hope this article provided you with a clear overview of LightRAG and GraphRAG. In upcoming articles, I plan to dive deeper into RAG evaluation metrics in production and provide a step-by-step guide on setting up LightRAG with open-source models on your local machine!

Resources:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

LightRAG and GraphRAG: The New Area of RAG Applications

Author(s): Narges

Traditional RAG: Strengths and Limitations

GraphRAG by Microsoft

Advantages of GraphRAG

Limitations of GraphRAG

LightRAG: A Lighter Alternative

Advantages of LightRAG

Benchmarks Discussion from the Paper

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

AI in Medical Imaging: A Life-Saving Revolution or Ethical Minefield?

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

LightRAG and GraphRAG: The New Area of RAG Applications

Author(s): Narges

Traditional RAG: Strengths and Limitations

GraphRAG by Microsoft

Advantages of GraphRAG

Limitations of GraphRAG

LightRAG: A Lighter Alternative

Advantages of LightRAG

Benchmarks Discussion from the Paper

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement