RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph

Author(s): Dwaipayan Bandyopadhyay

Originally published on Towards AI.

Retrieval Augmented Generation is a very well-known approach in the field of Generative AI, which usually consists of a linear flow of chunking a document, storing it in a vector database, then retrieving relevant chunks based on the user query and feeding that to an LLM to get the final response. In recent times, the term “Agentic AI” has been taking the internet by storm, in simple terms it refers to break down a problem into smaller sections and assigning it to certain “agents” who are capable of handling a certain task, and combining smaller agents like that to build a complex workflow. What if we combine this Agentic Approach and Retrieval Augmented Generation? In this article, we will explain a similar concept/architecture we developed using LangGraph, FAISS and OpenAI.

RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph — Source : Image by Author

We will not explore AI Agents and how they work in this article; otherwise, this would become a full-fledged book. But to give a brief overview of what “AI Agents” are, we can consider an “AI Agent” as an assistant, someone or something that is a master in one particular task, multiple agents with multiple capabilities are being added together to make a full Graphical Agentic Workflow, where each agents may communicate with each other, can understand what response the previous agent returned etc.

In our approach, we divided the concept of “Retrieval Augmented Generation” into three different tasks and created agent for each task which are capable of handling one specific task, one agent will look into the Retrieval Part, whereas the other will look into the Augmentation Part, and finally the last agent will look into the Generation Part. Then we have combined all three agents to make a complete end-to-end agentic workflow. Let’s dive deep into the coding section.

Coding Section Starts

Firstly, we will install all the necessary packages required. The best practice would be to create a virtual environment first, and then install the following packages.

After they are installed successfully, we will import all the necessary packages to create the Retriever agent first.

Coding the RetrieverAgent :

from langchain_openai import ChatOpenAI
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pypdf import PdfReader
import re
from dotenv import load_dotenv
import streamlit as st

load_dotenv()


LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)

def extract_text_from_pdf(pdf_path):
 try:
 pdf = PdfReader(pdf_path)
 output = []
 for i, page in enumerate(pdf.pages, 1):
 text = page.extract_text()
 text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text)
 text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip())
 text = re.sub(r"\n\s*\n", "\n\n", text)
 output.append((text, i)) # Tuple of (text, page number)
 return output
 except Exception as e:
 st.error(f"Error reading PDF: {e}")
 return []


def text_to_docs(text_with_pages):
 docs = []
 text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
 for text, page_num in text_with_pages:
 chunks = text_splitter.split_text(text)
 for i, chunk in enumerate(chunks):
 doc = Document(
 page_content=chunk,
 metadata={"source": f"page-{page_num}", "page_num": page_num}
 )
 docs.append(doc)
 return docs

def create_vectordb(pdf_path):
 text_with_pages = extract_text_from_pdf(pdf_path)
 if not text_with_pages:
 raise ValueError("No text extracted from PDF.")
 docs = text_to_docs(text_with_pages)
 embeddings = OpenAIEmbeddings()
 return FAISS.from_documents(docs, embeddings)

# Define Tools
def retrieve_from_pdf(query: str, vectordb) -> dict:
 """Retrieve the most relevant text and page number using similarity search."""
 # Use similarity_search to get the top result
 docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result
 if docs:
 doc = docs[0]
 content = f"Page {doc.metadata['page_num']}: {doc.page_content}"
 page_num = doc.metadata["page_num"]
 return {"content": content, "page_num": page_num}
 return {"content": "No content retrieved.", "page_num": None}




RETRIEVE_PROMPT = ChatPromptTemplate.from_messages([
 ("system", """
 You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query.
 - Use the provided retrieval function to get content and a single page number.
 - Return the content directly with the page number included (e.g., 'Page X: text').
 - If no content is found, return "No content retrieved."
 """),
 MessagesPlaceholder(variable_name="chat_history"),
 ("human", "{query}"),
])

Explanation of the Code –

In this retriever agent code, firstly, we are importing all the necessary modules and classes required, We are storing our credentials, such as OpenAI API Key, in a .env file, which is why the dotenv module has been used here alongside the load_dotenv function call. Next up, we are initialising the LLM by providing required arguments such as model name, temperature, etc.

Descriptions of Functions

extract_text_from_pdf is being used to read and extract the content of the PDF and cleanse it a bit by fixing hyphenated line breaks, which are causing a word to break into two pieces, converting single newlines into spaces unless they are a part of paragraph spacing, etc. The cleaning process is done page-wise, which is why a loop is applied over the number of pages using the enumerate function. Finally, from this function, the cleansed extracted content is returned alongside its’s pagenumber is returned as a form of list of tuples. If any unwanted error occurs, that too can be handled via the try-except block used; this ensures the code works seamlessly without breaking due to errors.

text_to_docs is being used to do Chunking, here the RecursiveCharacterTextSplitter class of the langchain module is being used, each chunk size would be of 4000, and the overlapping would be of 200. Then a loop is being done over the text_with_pages argument, which will receive the output from the previous function, i.e extract_text_from_pdf, as it returns the output in a list of tuples format. Two variables are being used in the loop to consider both items of the tuple. Then the cleansed text is split into chunks and converted into a Document object, which will be further used to convert them into Embeddings. Apart from the page content, the Document object will hold the page number and a string label including the page number as metadata. Each Document will then be appended to a list and returned.

create_vectordb This function uses the above two functions to create Embeddings using FAISS (Facebook AI Similarity Search) Vectorstore. It is a lightweight vector store that stores the index locally and helps in doing similarity searches with ease. This function just creates and returns the Vector database. That’s it.

retrieve_from_pdf In this function, we are doing the similarity search and getting the top 3 chunks, and if found, then we are considering the first chunk only so that it consists of the most similar content and returning it along with it’s page number as a dictionary.

The RETRIEVE_PROMPT is a ChatPromptTemplate consisting of the instruction, i.e System Message for the LLM, mentioning its job as a retriever agent. It will also consider the entire chat history of a particular session and will accept the user query as human input.

Coding the Augmentator Agent

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from typing import Optional



def augment_with_context(content: str, page_num: Optional[int]) -> str:
 """Augment retrieved content with source context."""
 if content != "No content retrieved." and page_num:
 return f"{content}\n\nAdditional context: Sourced from page {page_num}."
 return f"{content}\n\nAdditional context: No specific page identified."

AUGMENT_PROMPT = ChatPromptTemplate.from_messages([
 ("system", """
 You are the Augment Agent. Enhance the retrieved content with additional context.
 - If content is available, append a note with the single page number.
 - If no content is retrieved, return "No augmented content."
 """),
 MessagesPlaceholder(variable_name="chat_history"),
 ("human", "Retrieved content: {retrieved_content}\nPage number: {page_num}"),
])

Explanation of the Functions

augment_with_context This is a very straightforward approach where we are looking for some extra information from the provided PDF to solidify the retrieved information by the retrieval agent. If found, the extra content, alongside its page number, will be added to the original retrieved content; otherwise, if both are not found, it will simply return the same original content without any modification

The AUGMENT_PROMPT is again very straightforward, it is just information to the LLM to look for information that will solidify the content fetched by the Retrieval Agent, which is also considered the chat_history, and the variables retrieved_content and page_num will be populated by the LLM automatically during the runtime.

Coding the GeneratorAgent

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

GENERATE_PROMPT = ChatPromptTemplate.from_messages([
 ("system", """
 You are the Generate Agent. Create a detailed response based on the augmented content.
 - Focus on DBMS and SQL content.
 - Append "Source: Page X" at the end if a page number is available.
 - If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number.
 - If the question is not DBMS-related, reply "Not applicable."
 - Use the chat history to maintain context.
 """),
 MessagesPlaceholder(variable_name="chat_history"),
 ("human", "{query}\nAugmented content: {augmented_content}"),
])

The generator agent only consists of the PromptTemplate with the instruction of how to generate final response based on the retrieved content as well as augmented extra information from previous two steps.

After all these separate agents are created, it’s time to store them under a single umbrella and form the entire end-to-end workflow using LangGraph.

Code for the Graph Creation using LangGraph

import streamlit as st
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Optional
import re
from IPython.display import display, Image
from retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)
from augmentation import augment_with_context,AUGMENT_PROMPT
from generation import GENERATE_PROMPT
from dotenv import load_dotenv



load_dotenv()



PDF_FILE_PATH = "dbms_notes.pdf"

# Define the Agent State
class AgentState(TypedDict):
 query: str
 chat_history: List[dict]
 retrieved_content: Optional[str]
 page_num: Optional[int] # Single page number instead of a list
 augmented_content: Optional[str]
 response: Optional[str]

def format_for_display(text):
 def replace_latex(match):
 latex_expr = match.group(1)
 return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX
 text = re.sub(r'\\frac\{([^}]+)\}\{([^}]+)\}', r'$\\frac{\1}{\2}$', text)
 return text

# Define Multi-Agent Nodes
def retrieve_agent(state: AgentState) -> AgentState:
 chain = RETRIEVE_PROMPT | LLM
 retrieved = retrieve_from_pdf(state["query"], st.session_state.vectordb)
 response = chain.invoke({"query": state["query"], "chat_history": state["chat_history"]})
 #print(retrieved)
 return {
 "retrieved_content": retrieved['content'],
 "page_num": retrieved["page_num"]
 }

def augment_agent(state: AgentState) -> AgentState:
 chain = AUGMENT_PROMPT | LLM
 if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.":
 # Prepare input for the LLM
 input_data = {
 "retrieved_content": state["retrieved_content"],
 "page_num": str(state["page_num"]) if state["page_num"] else "None",
 "chat_history": state["chat_history"]
 }
 # Invoke the LLM to generate augmented content
 response = chain.invoke(input_data)
 augmented_content = response.content # Use the LLM's output
 else:
 augmented_content = "No augmented content."
 return {"augmented_content": augmented_content}

def generate_agent(state: AgentState) -> AgentState:
 chain = GENERATE_PROMPT | LLM
 response = chain.invoke({
 "query": state["query"],
 "augmented_content": state["augmented_content"] or "No augmented content.",
 "chat_history": state["chat_history"]
 })
 
 return {"response": response.content}

# Define Conditional Edge Logic
def decide_augmentation(state: AgentState) -> str:
 if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.":
 return "augmentation"
 return "generation"


workflow = StateGraph(AgentState)
workflow.add_node("retrieve_agent", retrieve_agent)
workflow.add_node("augment_agent", augment_agent)
workflow.add_node("generate_agent", generate_agent)

workflow.set_entry_point("retrieve_agent")
workflow.add_conditional_edges(
 "retrieve_agent",
 decide_augmentation,
 {
 "augmentation": "augment_agent",
 "generation": "generate_agent"
 }
)
workflow.add_edge("augment_agent", "generate_agent")
workflow.add_edge("generate_agent", END)

agent = workflow.compile()


# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))


st.set_page_config(page_title="🤖 RAGent", layout="wide")
st.title("🤖 RAGent : Your Personal Teaching Assistant")
st.markdown("Ask any question from your book and get detailed answers with a single source page!")

# Initialize session state for vector database
if "vectordb" not in st.session_state:
 with st.spinner("Loading PDF content... This may take a minute."):
 try:
 st.session_state.vectordb = create_vectordb(PDF_FILE_PATH)
 except Exception as e:
 st.error(f"Failed to load PDF: {e}")
 st.stop()

# Initialize chat history in session state
if "messages" not in st.session_state:
 st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
 with st.chat_message(message["role"]):
 st.markdown(message["content"])

# User input
user_input = st.chat_input("Ask anything from the PDF")

if user_input:
 # Add user message to chat history
 st.session_state.messages.append({"role": "user", "content": user_input})
 with st.chat_message("user"):
 st.markdown(user_input)

 # Display assistant response
 with st.chat_message("assistant"):
 message_placeholder = st.empty()

 # Prepare chat history for the agent
 chat_history = [
 {"type": "human", "content": msg["content"]} if msg["role"] == "user" else
 {"type": "ai", "content": msg["content"]}
 for msg in st.session_state.messages[:-1] # Exclude current input
 ]

 # Prepare initial state
 initial_state = {
 "query": user_input,
 "chat_history": chat_history,
 "retrieved_content": None,
 "page_num": None,
 "augmented_content": None,
 "response": None, # Add field for Ragas sample
 }

 # Run the agent with a spinner
 with st.spinner("Processing..."):
 final_state = agent.invoke(initial_state)
 answer = final_state["response"]
 formatted_answer = format_for_display(answer)

 # Display response
 message_placeholder.markdown(formatted_answer)

 # Update chat history
 st.session_state.messages.append({
 "role": "assistant",
 "content": formatted_answer
 })

Explanation of the Code

AgentState class — In this class, we are defining a schema that will be enforced on top of the LLM response and the entire “state” will carry this same structure throughout the entire workflow. This will be passed as argument during the StateGraph creation.

format_for_display function — This function has a nested function, which will be used to handle LaTeX-based outputs. We are using this because the document may contain fractions which might not be handled by Streamlit properly, so using this as an extra precaution.

retrieve_agent function — This will use the retrieve_from_pdf function we defined earlier. Firstly, we will create a chain using the retrieve prompt and LLM. Then, invoke it using the query provided by the user, which is nothing but the user’s question, and also consider the entire chat_history as well, and finally it will return the content and page number.

augment_agent function — Here, we will again create a chain using the AUGMENT_PROMPT, this time and check whether the retriever agent returned any content or not. If it returned any content, then we will call the augment_with_context function and pass the retrieved content, page number, as well as the chat_history, then return the content provided by the response.

generate_agent function — Here, finally, we are passing the augmented content, user query and chat history so that LLM can leverage the augmented content and generate the final response based on the augmented information and display it to the user.

decide_augmentation function — This is an optional step being provided to check whether it is necessary for the augmentation agent to run or not.

After all the necessary agents are created, it’s time to combine them to create an end-to-end workflow, which will be done by using the StateGraph class of LangGraph. During initialisation of the StateGraph class, we will pass the AgentState class we defined earlier as its parameter to indicate that during the entire workflow, these are the only keys that will be there in the response, nothing else. Then we are adding the nodes into the StateGraph to create the entire workflow, setting up the entry point manually to make it understand which node will be executed first, adding edges in between the nodes to state how the workflow will look like, adding a conditional edge in between to signify that the node connected with the conditional edge, may or may not be called during the workflow everytime.

Finally, compiling the entire workflow to check whether everything is working fine and the graph that has been created is proper or not. We can display the graph using the IPython module and Mermaid ink method. The graph will look like below, if everything goes correctly.

Then, the rest of the code is entirely Streamlit-based. The user can design the UI according to their choice. We have taken a very basic approach in designing the UI, so that it remains user-friendly. We are considering some session states as well, so to maintain the chat history, user query, etc. This will not start without the user input, meaning that until and unless the user provides any query, the workflow will not start.

Screenshots of the Application in Working Condition –

This article has been written in collaboration with Biswajit Das

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph

Author(s): Dwaipayan Bandyopadhyay

Coding Section Starts

Coding the RetrieverAgent :

Descriptions of Functions

Coding the Augmentator Agent

Coding the GeneratorAgent

Code for the Graph Creation using LangGraph

Explanation of the Code

Screenshots of the Application in Working Condition –

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph

Author(s): Dwaipayan Bandyopadhyay

Coding Section Starts

Coding the RetrieverAgent :

Descriptions of Functions

Coding the Augmentator Agent

Coding the GeneratorAgent

Code for the Graph Creation using LangGraph

Explanation of the Code

Screenshots of the Application in Working Condition –

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement