Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-FranΓ§ois Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph
Latest   Machine Learning

RAGent: A Multi-Agent PDF Whisperer Built on LangChain + LangGraph

Author(s): Dwaipayan Bandyopadhyay

Originally published on Towards AI.

Retrieval Augmented Generation is a very well-known approach in the field of Generative AI, which usually consists of a linear flow of chunking a document, storing it in a vector database, then retrieving relevant chunks based on the user query and feeding that to an LLM to get the final response. In recent times, the term β€œAgentic AI” has been taking the internet by storm, in simple terms it refers to break down a problem into smaller sections and assigning it to certain β€œagents” who are capable of handling a certain task, and combining smaller agents like that to build a complex workflow. What if we combine this Agentic Approach and Retrieval Augmented Generation? In this article, we will explain a similar concept/architecture we developed using LangGraph, FAISS and OpenAI.

Source : Image by Author

We will not explore AI Agents and how they work in this article; otherwise, this would become a full-fledged book. But to give a brief overview of what β€œAI Agents” are, we can consider an β€œAI Agent” as an assistant, someone or something that is a master in one particular task, multiple agents with multiple capabilities are being added together to make a full Graphical Agentic Workflow, where each agents may communicate with each other, can understand what response the previous agent returned etc.

In our approach, we divided the concept of β€œRetrieval Augmented Generation” into three different tasks and created agent for each task which are capable of handling one specific task, one agent will look into the Retrieval Part, whereas the other will look into the Augmentation Part, and finally the last agent will look into the Generation Part. Then we have combined all three agents to make a complete end-to-end agentic workflow. Let’s dive deep into the coding section.

Coding Section Starts

Firstly, we will install all the necessary packages required. The best practice would be to create a virtual environment first, and then install the following packages.

After they are installed successfully, we will import all the necessary packages to create the Retriever agent first.

Coding the RetrieverAgent :

from langchain_openai import ChatOpenAI
from langchain_community.vectorstores.faiss import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain.docstore.document import Document
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from pypdf import PdfReader
import re
from dotenv import load_dotenv
import streamlit as st

load_dotenv()


LLM = ChatOpenAI(model_name="gpt-4o", temperature=0.0)

def extract_text_from_pdf(pdf_path):
try:
pdf = PdfReader(pdf_path)
output = []
for i, page in enumerate(pdf.pages, 1):
text = page.extract_text()
text = re.sub(r"(\w+)-\n(\w+)", r"\1\2", text)
text = re.sub(r"(?<!\n\s)\n(?!\s\n)", " ", text.strip())
text = re.sub(r"\n\s*\n", "\n\n", text)
output.append((text, i)) # Tuple of (text, page number)
return output
except Exception as e:
st.error(f"Error reading PDF: {e}")
return []


def text_to_docs(text_with_pages):
docs = []
text_splitter = RecursiveCharacterTextSplitter(chunk_size=4000, chunk_overlap=200)
for text, page_num in text_with_pages:
chunks = text_splitter.split_text(text)
for i, chunk in enumerate(chunks):
doc = Document(
page_content=chunk,
metadata={"source": f"page-{page_num}", "page_num": page_num}
)
docs.append(doc)
return docs

def create_vectordb(pdf_path):
text_with_pages = extract_text_from_pdf(pdf_path)
if not text_with_pages:
raise ValueError("No text extracted from PDF.")
docs = text_to_docs(text_with_pages)
embeddings = OpenAIEmbeddings()
return FAISS.from_documents(docs, embeddings)

# Define Tools
def retrieve_from_pdf(query: str, vectordb) -> dict:
"""Retrieve the most relevant text and page number using similarity search."""
# Use similarity_search to get the top result
docs = vectordb.similarity_search(query, k=3) # k=1 for single most relevant result
if docs:
doc = docs[0]
content = f"Page {doc.metadata['page_num']}: {doc.page_content}"
page_num = doc.metadata["page_num"]
return {"content": content, "page_num": page_num}
return {"content": "No content retrieved.", "page_num": None}




RETRIEVE_PROMPT = ChatPromptTemplate.from_messages([
("system", """
You are the Retrieve Agent. Your task is to fetch the most relevant text from a PDF based on the user's query.
- Use the provided retrieval function to get content and a single page number.
- Return the content directly with the page number included (e.g., 'Page X: text').
- If no content is found, return "No content retrieved."
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{query}"),
])

Explanation of the Code –

In this retriever agent code, firstly, we are importing all the necessary modules and classes required, We are storing our credentials, such as OpenAI API Key, in a .env file, which is why the dotenv module has been used here alongside the load_dotenv function call. Next up, we are initialising the LLM by providing required arguments such as model name, temperature, etc.

Descriptions of Functions

extract_text_from_pdf is being used to read and extract the content of the PDF and cleanse it a bit by fixing hyphenated line breaks, which are causing a word to break into two pieces, converting single newlines into spaces unless they are a part of paragraph spacing, etc. The cleaning process is done page-wise, which is why a loop is applied over the number of pages using the enumerate function. Finally, from this function, the cleansed extracted content is returned alongside its’s pagenumber is returned as a form of list of tuples. If any unwanted error occurs, that too can be handled via the try-except block used; this ensures the code works seamlessly without breaking due to errors.

text_to_docs is being used to do Chunking, here the RecursiveCharacterTextSplitter class of the langchain module is being used, each chunk size would be of 4000, and the overlapping would be of 200. Then a loop is being done over the text_with_pages argument, which will receive the output from the previous function, i.e extract_text_from_pdf, as it returns the output in a list of tuples format. Two variables are being used in the loop to consider both items of the tuple. Then the cleansed text is split into chunks and converted into a Document object, which will be further used to convert them into Embeddings. Apart from the page content, the Document object will hold the page number and a string label including the page number as metadata. Each Document will then be appended to a list and returned.

create_vectordb This function uses the above two functions to create Embeddings using FAISS (Facebook AI Similarity Search) Vectorstore. It is a lightweight vector store that stores the index locally and helps in doing similarity searches with ease. This function just creates and returns the Vector database. That’s it.

retrieve_from_pdf In this function, we are doing the similarity search and getting the top 3 chunks, and if found, then we are considering the first chunk only so that it consists of the most similar content and returning it along with it’s page number as a dictionary.

The RETRIEVE_PROMPT is a ChatPromptTemplate consisting of the instruction, i.e System Message for the LLM, mentioning its job as a retriever agent. It will also consider the entire chat history of a particular session and will accept the user query as human input.

Coding the Augmentator Agent

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder
from typing import Optional



def augment_with_context(content: str, page_num: Optional[int]) -> str:
"""Augment retrieved content with source context."""
if content != "No content retrieved." and page_num:
return f"{content}\n\nAdditional context: Sourced from page {page_num}."
return f"{content}\n\nAdditional context: No specific page identified."

AUGMENT_PROMPT = ChatPromptTemplate.from_messages([
("system", """
You are the Augment Agent. Enhance the retrieved content with additional context.
- If content is available, append a note with the single page number.
- If no content is retrieved, return "No augmented content."
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "Retrieved content: {retrieved_content}\nPage number: {page_num}"),
])

Explanation of the Functions

augment_with_context This is a very straightforward approach where we are looking for some extra information from the provided PDF to solidify the retrieved information by the retrieval agent. If found, the extra content, alongside its page number, will be added to the original retrieved content; otherwise, if both are not found, it will simply return the same original content without any modification

The AUGMENT_PROMPT is again very straightforward, it is just information to the LLM to look for information that will solidify the content fetched by the Retrieval Agent, which is also considered the chat_history, and the variables retrieved_content and page_num will be populated by the LLM automatically during the runtime.

Coding the GeneratorAgent

from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

GENERATE_PROMPT = ChatPromptTemplate.from_messages([
("system", """
You are the Generate Agent. Create a detailed response based on the augmented content.
- Focus on DBMS and SQL content.
- Append "Source: Page X" at the end if a page number is available.
- If the user query consists of terms like "explain", "simple", "simplify" etc. or relatable, then do not return any page number, otherwise return the proper page number.
- If the question is not DBMS-related, reply "Not applicable."
- Use the chat history to maintain context.
"""
),
MessagesPlaceholder(variable_name="chat_history"),
("human", "{query}\nAugmented content: {augmented_content}"),
])

The generator agent only consists of the PromptTemplate with the instruction of how to generate final response based on the retrieved content as well as augmented extra information from previous two steps.

After all these separate agents are created, it’s time to store them under a single umbrella and form the entire end-to-end workflow using LangGraph.

Code for the Graph Creation using LangGraph

import streamlit as st
from langgraph.graph import StateGraph, END
from typing import TypedDict, List, Optional
import re
from IPython.display import display, Image
from retriever import (LLM,extract_text_from_pdf,text_to_docs,create_vectordb,retrieve_from_pdf,RETRIEVE_PROMPT)
from augmentation import augment_with_context,AUGMENT_PROMPT
from generation import GENERATE_PROMPT
from dotenv import load_dotenv



load_dotenv()



PDF_FILE_PATH = "dbms_notes.pdf"

# Define the Agent State
class AgentState(TypedDict):
query: str
chat_history: List[dict]
retrieved_content: Optional[str]
page_num: Optional[int] # Single page number instead of a list
augmented_content: Optional[str]
response: Optional[str]

def format_for_display(text):
def replace_latex(match):
latex_expr = match.group(1)
return f"$${latex_expr}$$" # Use $$ for Streamlit Markdown to render LaTeX
text = re.sub(r'\\frac\{([^}]+)\}\{([^}]+)\}', r'$\\frac{\1}{\2}$', text)
return text

# Define Multi-Agent Nodes
def retrieve_agent(state: AgentState) -> AgentState:
chain = RETRIEVE_PROMPT | LLM
retrieved = retrieve_from_pdf(state["query"], st.session_state.vectordb)
response = chain.invoke({"query": state["query"], "chat_history": state["chat_history"]})
#print(retrieved)
return {
"retrieved_content": retrieved['content'],
"page_num": retrieved["page_num"]
}

def augment_agent(state: AgentState) -> AgentState:
chain = AUGMENT_PROMPT | LLM
if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.":
# Prepare input for the LLM
input_data = {
"retrieved_content": state["retrieved_content"],
"page_num": str(state["page_num"]) if state["page_num"] else "None",
"chat_history": state["chat_history"]
}
# Invoke the LLM to generate augmented content
response = chain.invoke(input_data)
augmented_content = response.content # Use the LLM's output
else:
augmented_content = "No augmented content."
return {"augmented_content": augmented_content}

def generate_agent(state: AgentState) -> AgentState:
chain = GENERATE_PROMPT | LLM
response = chain.invoke({
"query": state["query"],
"augmented_content": state["augmented_content"] or "No augmented content.",
"chat_history": state["chat_history"]
})

return {"response": response.content}

# Define Conditional Edge Logic
def decide_augmentation(state: AgentState) -> str:
if state["retrieved_content"] and state["retrieved_content"] != "No content retrieved.":
return "augmentation"
return "generation"


workflow = StateGraph(AgentState)
workflow.add_node("retrieve_agent", retrieve_agent)
workflow.add_node("augment_agent", augment_agent)
workflow.add_node("generate_agent", generate_agent)

workflow.set_entry_point("retrieve_agent")
workflow.add_conditional_edges(
"retrieve_agent",
decide_augmentation,
{
"augmentation": "augment_agent",
"generation": "generate_agent"
}
)
workflow.add_edge("augment_agent", "generate_agent")
workflow.add_edge("generate_agent", END)

agent = workflow.compile()


# display(Image(agent.get_graph().draw_mermaid_png(output_file_path="tutor_agent.png")))


st.set_page_config(page_title="🤖 RAGent", layout="wide")
st.title("🤖 RAGent : Your Personal Teaching Assistant")
st.markdown("Ask any question from your book and get detailed answers with a single source page!")

# Initialize session state for vector database
if "vectordb" not in st.session_state:
with st.spinner("Loading PDF content... This may take a minute."):
try:
st.session_state.vectordb = create_vectordb(PDF_FILE_PATH)
except Exception as e:
st.error(f"Failed to load PDF: {e}")
st.stop()

# Initialize chat history in session state
if "messages" not in st.session_state:
st.session_state.messages = []

# Display chat history
for message in st.session_state.messages:
with st.chat_message(message["role"]):
st.markdown(message["content"])

# User input
user_input = st.chat_input("Ask anything from the PDF")

if user_input:
# Add user message to chat history
st.session_state.messages.append({"role": "user", "content": user_input})
with st.chat_message("user"):
st.markdown(user_input)

# Display assistant response
with st.chat_message("assistant"):
message_placeholder = st.empty()

# Prepare chat history for the agent
chat_history = [
{"type": "human", "content": msg["content"]} if msg["role"] == "user" else
{"type": "ai", "content": msg["content"]}
for msg in st.session_state.messages[:-1] # Exclude current input
]

# Prepare initial state
initial_state = {
"query": user_input,
"chat_history": chat_history,
"retrieved_content": None,
"page_num": None,
"augmented_content": None,
"response": None, # Add field for Ragas sample
}

# Run the agent with a spinner
with st.spinner("Processing..."):
final_state = agent.invoke(initial_state)
answer = final_state["response"]
formatted_answer = format_for_display(answer)

# Display response
message_placeholder.markdown(formatted_answer)

# Update chat history
st.session_state.messages.append({
"role": "assistant",
"content": formatted_answer
})

Explanation of the Code

AgentState class β€” In this class, we are defining a schema that will be enforced on top of the LLM response and the entire β€œstate” will carry this same structure throughout the entire workflow. This will be passed as argument during the StateGraph creation.

format_for_display function β€” This function has a nested function, which will be used to handle LaTeX-based outputs. We are using this because the document may contain fractions which might not be handled by Streamlit properly, so using this as an extra precaution.

retrieve_agent function β€” This will use the retrieve_from_pdf function we defined earlier. Firstly, we will create a chain using the retrieve prompt and LLM. Then, invoke it using the query provided by the user, which is nothing but the user’s question, and also consider the entire chat_history as well, and finally it will return the content and page number.

augment_agent function β€” Here, we will again create a chain using the AUGMENT_PROMPT, this time and check whether the retriever agent returned any content or not. If it returned any content, then we will call the augment_with_context function and pass the retrieved content, page number, as well as the chat_history, then return the content provided by the response.

generate_agent function β€” Here, finally, we are passing the augmented content, user query and chat history so that LLM can leverage the augmented content and generate the final response based on the augmented information and display it to the user.

decide_augmentation function β€” This is an optional step being provided to check whether it is necessary for the augmentation agent to run or not.

After all the necessary agents are created, it’s time to combine them to create an end-to-end workflow, which will be done by using the StateGraph class of LangGraph. During initialisation of the StateGraph class, we will pass the AgentState class we defined earlier as its parameter to indicate that during the entire workflow, these are the only keys that will be there in the response, nothing else. Then we are adding the nodes into the StateGraph to create the entire workflow, setting up the entry point manually to make it understand which node will be executed first, adding edges in between the nodes to state how the workflow will look like, adding a conditional edge in between to signify that the node connected with the conditional edge, may or may not be called during the workflow everytime.

Finally, compiling the entire workflow to check whether everything is working fine and the graph that has been created is proper or not. We can display the graph using the IPython module and Mermaid ink method. The graph will look like below, if everything goes correctly.

Source : Image by Author

Then, the rest of the code is entirely Streamlit-based. The user can design the UI according to their choice. We have taken a very basic approach in designing the UI, so that it remains user-friendly. We are considering some session states as well, so to maintain the chat history, user query, etc. This will not start without the user input, meaning that until and unless the user provides any query, the workflow will not start.

Screenshots of the Application in Working Condition –

Source : Image by Author
Source : Image by Author

This article has been written in collaboration with Biswajit Das

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓