Agentic Intelligence in Action: Developing an Agentic Intelligent Document Summarizer!
Last Updated on January 15, 2025 by Editorial Team
Author(s): Harshit Dawar
Originally published on Towards AI.
So here we are, in the world of Agents, where everything has started moving towards Agentic AI. Even Mr. Satya Nadella predicted that βagent-based softwares will replace SaaS applications.β
Hence, it's of utmost importance to understand the agents & gain the capability to develop our agents to be relevant in this rapidly ever-evolving world. This article will help you to develop your own βAgentic Intelligent Document Summarizerβ.
Why invest your time in this article/blog?
This blog will help you understand all the steps required to develop your own βAgentic Intelligent Document Summarizerβ that not only can take any document as input & its content can be summarized using an intelligent agent, but it can be deployed anywhere (because of containerization).
Every stepβs code is explained at its respective place in its blog, & at the last of this article, the complete code link for the project is mentioned.
The application will be developed using the following tools & technologies:
- AWS Bedrock
- AWS Textract
- AWS S3
- Docker
- FastAPI
- Python
- REST API concepts
- Generative AI
- Agents
To understand this article/blog in its full extent, itβs required to be at least familiar with the above-mentioned topics. I will highly recommend thoroughly reading the below-mentioned article; it will make you familiar with many of the above-mentioned concepts & boost your confidence. It will also explain how to get access to a few AWS Bedrock models that are only available once requested.
Developing a personalized meal informer through RAG using AWS Bedrock!
This blog aims to explain the service of AWS Bedrock in complete detail, why AWS Bedrock?, what RAG is?, itsβ¦
harshitdawar.medium.com
Donβt be overwhelmed by the names mentioned above; this blog will explain everything in the easiest way possible to make you comfortable with everything & impart enough knowledge to you so that you can create your own custom agents to solve even the most complex use cases.
Prerequisites for developing this application!
To run the application I developed, or to run the application you will be developing, you require the following:
- AWS account keys with sufficient privileges: To use the AWS services
- An AWS S3 bucket: To store the document that needs to be summarized
- Any AWS Bedrock Text Model: To summarize the content & to be used as an LLM in the agent.
- Any container execution/orchestration tool: If you want your application to be containerized, then this is required; otherwise, you can directly run your application using Python. If you choose to containerize the application, then you can use Docker, Podman, Kubernetes, OpenShift, or any other equivalent cloud-based offering.
With all the prerequisites mentioned, letβs start the development of the βAgentic Intelligent Document Summarizer applicationβ.
Agentic Intelligent Document Summarizer!
By following the best practices & to simplify the application development, its code is divided into modules. Letβs code each of the modules.
Defining AWS Clients!
To access every AWS service, its respective AWS Client needs to be defined.
The above code is creating the client for the following AWS services:
- AWS S3: To upload the document that needs to be summarized.
- AWS Textract: To extract the content from the document.
- AWS Bedrock: To summarize the content extracted from the document.
AWS keys are passed using the environment variables (which is a best practice) in the above code. You can use any of the methods you are aware of for doing so; the simplest one is hard-coding, but it's the worst method from the security point of view. If you are containerizing the application using Docker or an equivalent, you can pass the keys in the Dockerfile, or as environment variables while running the container (recommended from a security point of view). In case you are using Kubernetes or an equivalent, then you can use secrets as well.
Defining Tools!
Tools are the helpers that an agent leverages to perform an operation/action on its own. Every tool docstring must contain the purpose of that tool because this is the only thing based on which the agent will decide whether to use a particular tool for an operation or not.
Note 1: All the tools will be defined using the β@toolβ decorator present in the langchain module, which is a best practice to define a tool.
Note 2: Do not worry about the libraries that need to be imported to run the code. The motive of this article is to explain the important code, not the boilerplate code or the libraries required to run the code. The complete code GitHub repository link (including everything) is mentioned at the end of the blog.
AWS S3 Tool!
The above code will be creating a tool to upload a file to S3, which will be leveraged by the agent we will be defining further in this article.
Any LangChain tool cannot have more than 1 argument; however, this tool will require 2 arguments: βfile_pathβ & βobject_nameβ (target file name in S3) to upload a file to S3. Hence, to make it possible, a trick of prompt has been used where the input format required is defined in the docstring of the tool, based on which the agent will pass the arguments as described. Since the format of arguments is the same as mentioned in the prompt, both the required arguments to upload a document to S3 are fetched using the
split()
method, & the document is uploaded to AWS S3.
AWS Textract Tool!
The above code will be extracting the content using AWS Textract from a document present in an AWS S3 bucket. This tool will be leveraged as well by the agent we will be defining further in this article.
Similar to the above tool (AWS S3 Tool), this tool also requires 2 arguments, & the exact same problem & its solution have been used in this tool as well as mentioned for the above tool.
AWS Bedrock Tool!
The above code will be summarizing the content using AWS Bedrock, which is extracted from a document present in an AWS S3 bucket. This tool will be leveraged as well by the agent we will be defining further in this article.
Defining the Agent!
Agent is the smart entity that will decide which actions to take & in which order to fulfill a use case/goal by using their intelligence.
The above code is defining the agent to meet our goal, i.e., summarizing a document. It is leveraging all the tools that have been created, & the agent will automatically decide when to use which tool, & it will fulfill our use case.
The LLM being used in the agent is the Llama 3 instruct variation with 8 billion parameters, which is available on AWS Bedrock.
Defining the main application route!
Here, the main application route is defined that will be responsible for taking the document as input, performing all the required things to meet the goal, & then at the end, returning the summary of the document.
The above code is defining the main application route, which is doing the following things in order:
- Taking the document as input
- Creating a directory with the name βmedia_filesβ in the filesystem that is being used by the application. If you are running it in Docker, then it will create the directory in the Docker container.
- Save a file (having the name as the document name) with the document content locally.
- Call the method to initialize the Agent.
- Invoke the Agent with the instructions required to meet our goal.
- The Agent will perform all the necessary actions by leveraging the tools defined.
- The summary of the document is returned by the application.
This concludes the development of our βAgentic Intelligent Document Summarizerβ
Example Run of the Agent!
Sample content on Generative AI is taken from Wikipedia, & a PDF document is created from that content, then that document is passed to the application; all the steps for the same are mentioned below.
Note: The Postman tool is being used for interacting with the applicationβs API.
Complete Link of Application Code!
The GitHub repository link of the complete application is mentioned below; you can take the code from there for quick testing, or you can customize it for your own use case if required.
GitHub – HarshitDawar55/agentic__intelligent_document_summarizer
Contribute to HarshitDawar55/agentic__intelligent_document_summarizer development by creating an account on GitHub.
github.com
Applicationβs plug-&-play container Image!
If you want to use the application directly or want to quickly test it without thinking of the code, then the link to my container image of this application is mentioned below.
https://hub.docker.com/r/harshitdawar/agentic-intelligent-document-summariser
To create a container using the image, you need to provide the few details that are:
- AWS Keys
- AWS S3 Bucket
Command to use:
docker run -dit -p <your system available port>:80 -e AWS_ACCESS_KEY="<your aws access key>" -e AWS_SECRET_ACCESS_KEY="<your aws secret access key>" -e S3_BUCKET="<S3 Bucket name to use>" harshitdawar/agentic-intelligent-document-summariser:latest
Once your application is running using the above command, then you call the application endpoint, which will be β<your application IP/DNS>:<port number that you used in the above command>/β. Then pass the document as βform-dataβ, an example for the same is showcased in the above image with the caption: βInput request to the applicationβs APIβImage by Author!β
This concludes this amazing blog. I hope you enjoyed it a lot. Do let me know your thoughts in the comments, and donβt forget to follow me. Also, if you want me to write an article on some of the topics, do reach out to me on LinkedIn or comment on any of my articles. I will be extremely happy to do the same.
I hope my article explains each and everything related to the topic with all the detailed concepts and explanations. Thank you so much for investing your time in reading my blog & boosting your knowledge. If you like my work, then I request you to applaud this blog & follow me on Medium, GitHub, & LinkedIn for more amazing content on multiple technologies and their integration!
Also, subscribe to me on Medium to get updates on all my blogs!
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI