
How GraphRAG Works Step-by-Step
Last Updated on May 13, 2025 by Editorial Team
Author(s): Mariana Avelino
Originally published on Towards AI.
Perhaps you’ve come across the paper From Local to Global: A GraphRAG Approach to Query-Focused Summarization, which is Microsoft Research’s take on Retrieval Augmented Generation (RAG) using knowledge graphs. Perhaps, you felt like some sections in the paper were vague. Perhaps, you wished the documentation more thoroughly explained how information gets retrieved from the graph. If that sounds like you, read on!
I’ve dug through the code, so you don’t have to, and in this article, I’ll describe each step of the GraphRAG process in detail. You’ll even learn a search method that the paper didn’t mention at all (Local Search).
What Is GraphRAG?
In one sentence, GraphRAG is an enhancement to retrieval-augmented generation that leverages graph structures.
There are different implementations of it, here we focus on Microsoft’s approach. It can be broken down into two main steps: Graph Creation (i.e. Indexing) and Querying (of which there are three possibilities: Local Search, Global Search and Drift Search).
I’m going to use a real-world example to walk you through Graph Creation, Local Search, and Global Search. So without further ado, let’s index and query the book Penitencia by Pablo Rivero using GraphRAG.

Set-Up
The GraphRAG documentation walks you through project set-up. Once you initialize your workspace, you’ll find a configuration file (settings.yaml) in the ragtest directory.

I’ve added the book Penitencia to the input folder. For this article, I’ve left the config file untouched to use the default settings and indexing method (IndexingMethod.Standard).
Graph Creation
To create the graph, run:
graphrag index --root ./ragtest
This triggers two key actions, Entity Extraction from Source Document(s) and Graph Partitioning into Communities, as defined in modules of the workflows directory of the GraphRAG project.

Entity Extraction
1.. In the create_base_text_units module, documents (in our case, the book Penitencia) are split into smaller chunks of N tokens.

2. In create_final_documents, a lookup table is created to map documents to their associated text units. Each row represents a document and since we are only working with one document, there is only one row.

3. In extract_graph, each chunk is analyzed using an LLM (from OpenAI) to extract entities and relationships guided by this prompt.
During this process, duplicate entities and relationships may appear. For example, the main character Jon is mentioned in 82 different text chunks, so he was extracted 82 times — once for each chunk.


An attempt at deduplication is made by grouping together entities based on their title and type, and grouping together relationships based on their source and target nodes. Then, the LLM is prompted to write a detailed description for each unique entity and unique relationship by analyzing the shorter descriptions from all occurrences (see prompt).


As you can see, deduplication is sometimes imperfect. Furthermore, GraphRAG does not handle entity disambiguation (e.g. Jon and Jon Márquez will be separate nodes despite referring to the same individual).
4. In finalize_graph, the NetworkX library is used to represent the entities and relationships as the nodes and edges of a graph, including structural information like node degree.


I find it helpful for understanding to actually see the graph, so I have visualized the result using Neo4j (notebook):



Graph Partitioning into Communities
5. In create_communities, the graph is partitioned into communities using the Leiden algorithm, a hierarchical clustering algorithm.
A community is a group of nodes that are more closely related to each other than to the rest of the graph. The hierarchical nature of the Leiden algorithm allows for communities of varying specificity to be detected, which is reflected in their level. The higher the level, the more specific the community (e.g. level 3 is quite specific, whereas level 0 is a root community and very generic).

If we visualize each community as a node, including the entities belonging to the community, we can make out clusters.

The value of communities lies in their ability to unite information from a wide range of sources, like entities and relationships, thereby providing big-picture insights. For books, communities can reveal overarching themes or topics within the text, as well will see in Step 8.

6. In create_final_text_units, the text unit table from Step 1 has the entity IDs, relationship IDs and covariate IDs (if any) mapped to each text unit ID for easier lookup.

Covariates are essentially claims. For example, “Celia murdered her husband and child (suspected).” The LLM deduces them from the text units guided by this prompt. By default, covariates are not extracted.
7. In create_community_reports, the LLM writes a report for each community, detailing its main events or themes, and a summary of the report. The LLM is guided by this prompt and receives as context all the entities, relationships and claims from the community.

For large communities, the context string (which includes entities, relationships and, possibly, covariates) might exceed the max_input_length specified in the config file. If that happens, the algorithm has a method to reduce the amount of text in the context, involving Hierarch Substitution and, if necessary, Trimming.
In Hierarchal Substitution, the raw text from entities, relationships, claims is replaced by the community reports of sub-communities.
For example, suppose Community C (level 0) has the sub-communities S1 and S2 (both level 1). Community S1 is larger in size (has more entities) than S2. Given this situation, all entities, relationships, and claims in C which are also in S1 are replaced by the community report of S1. This prioritizes the biggest reduction in token count. If context length still exceeds max_input_length after this change, then S2 is used to replace relevant entities and relationships in C.
If after hierarchal substitution, the context is still too long (or the community had no sub-communities to begin with), then the context string needs Trimming — less relevant data is simply excluded. Entities and relationships are sorted by their node degrees and combined degrees, respectively, and those with the lowest values are removed.
Ultimately, the LLM uses the provided context string to generate findings about the community (a list of 5–10 key insights) and a summary. These are joined to form the community report.


8. Finally, in generate_embeddings, embeddings for all text units, entity descriptions, and full_content texts (community title + community summary + community report + rank + rating explanation) are created using the Open AI embedding model specified in the config. The vector embeddings allow for efficient semantic searching of the graph based on a user query, which will be necessary during Local and Global Search.
Querying
Once the graph is built, we can start querying it. The implementation of the search functionalities can be found in the structure_search directory of the GraphRAG project.
Local Search
If you have a specific question, use the Local Search function provided by GraphRAG (additional example usage in notebook).
graphrag query \
--root ./ragtest \
--method local \
--query "What kind of retribution is Laura seeking, and why?"

1.. Community reports, text units, entities, relationships and covariates (if any) are loaded from the parquet files in ragtest/output/, where they have been saved automatically following graph creation.
Then, the user query is embedded and its semantic similarity to the embedding of each entity description is calculated.

The N most semantically similar entities are retrieved. The value of N is defined by the hyperparameter top_k_mapped_entities in the config.
Oddly, GraphRAG oversamples by a factor of 2, effectively retrieving 2 * top_k_mapped_entities entities. This is done to ensure that sufficient entities are extracted, because sometimes the retrieved entity has an invalid ID.


2. All extracted entities become candidate entities. The communities, relationships, and text units of extracted entities become candidate communities, candidate relationships, and candidate text units.
Specifically:
- Candidate communities are all communities that include at least one extracted entity.
- Candidate relationships are all graph edges where an extracted entities is either a source or a target node.
- Candidate text units are the chunks from the book that contain at least one extracted entity.

3. The candidates are sorted, with the most relevant items placed at the top of their respective lists. This ensures that the most important information is prioritized for answering the query.
Prioritization is necessary, because LLM context length is not infinite. There is a limit to how much information can be passed to the model. Hyperparameters set in the config determine how many context window tokens are allocated to entities, relationships, text units, and communities. By default, text_unit_prop=0.5 and community_prop=0.1, meaning that 50% of the max_tokens specified in the config will be occupied by text units, 10% by community reports, leaving 40% for descriptions from entities and relationships. max_tokens defaults to 12 000.
- Communities are sorted by their number of matches, that is the number of distinct text units in which extracted entities of the community appear. In case of a tie, they are sorted by their rank (LLM-assigned importance). Given max_tokens=12000 and community_prop=0.1, then community reports can occupy up to 1200 tokens. Only entire community reports are allowed, meaning there is no truncation — either a community report is included in its entirety or not at all.

- Candidate entities are not sorted, keeping the entities in the order of their semantic similarity to the user query. As many candidate entities as possible are added to the context. If 40% of max_tokens are allocated to entities and relationships that means up to 4800 tokens are available.

- Candidate relationships are prioritized differently depending on whether they are in-network or out-network relationships. In-network being relationships between two extracted entities. Out-network being relationships between an extracted entity and another one that is not part of the extracted entity set. Candidate relationships that are in-network are sorted by their combined_degree (sum of source and target node degrees). Candidate relationships that are out-network are sorted first by the number of links that the out-entity has to in-entities, then by combined_degree in case of a tie.


Finding in- and out-network relationships is an iterative process that stops as soon as the available token space is filled (in our example, available_tokens = 4800 — entity_descriptions). In-network relationships are added to the context first, as they are considered more important. Then, space allowing, the out-network relationships are added.

- Candidate text units are sorted by extracted entity order, followed by the number of extracted entity relationships associated with the text unit. Entity order ensures that the text units mentioning entities that are the most semantically similar to the user query get prioritized. For example, if Crímenes is the most semantically similar entity to the user query and text unit CB6F… is a chunk where Crímenes was extracted from, then CB6F… will be at the top of the list, even if there are few extracted entity relationships associated with it.


Given max_tokens=12000 and text_unit_prop=0.5, then community reports can occupy up to 6000 tokens. As in the case of community reports, text units are appended to the context until the token limit is reached, without truncation.

4. Finally, the descriptions of the prioritized community reports, entities, relationships, and text units — in this order — are concatenated and provided as context to the LLM, which generates a detailed response to the user query.

Global Search
If you have a general question, use the Global Search function (additional example usage in notebook).
graphrag query \
--root ./ragtest \
--method global \
--query "What themes are explored in the book?"

1.. Community reports and entities are loaded from the parquet files where they have been saved.
For each community, an occurence_weight is calculated. occurence_weight represents the normalized count of distinct text units where entities associated with the community appear. The value reflects how prevalent the community is throughout the document(s).


2. All communities are shuffled, then batched. Shuffling reduces bias by ensuring that not all the most relevant communities are collected in the same batch.
Each batch has its communities sorted by their community_weight. Essentially, communities whose entities appear in multiple text chunks are prioritized.

3. For each batch, the LLM generates multiple responses to the user query using the community reports as context and assigns a score to each response to reflects how helpful it is in answering the user’s question (prompt). Usually 5 responses are generated per batch.

All responses are ranked by their scores and any response with a score of zero is discarded.

4. The texts of the sorted responses are concatenated into a single input, which is passed to the LLM as context to produce a final answer to the user’s question (prompt).

Conclusion
This article has walked you step-by-step through Graph Creation, Local Search, and Global Search as implemented by Microsoft GraphRAG using real data and code-level insights. While the official documentation has improved significantly since I started using the project in early 2024, this deep dive fills in knowledge gaps and shines a light on what’s happening under the hood. To date, it’s the most detailed and up-to-date resource on GraphRAG that I’ve encountered and I hope you’ve found it useful.
Now, I encourage you to go beyond the default configuartion: Try tweaking parameters, fine-tuning the entity extraction prompt, or using a different indexing method. Experiment and harness the power of GraphRAG for your own projects!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.