Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio
Author(s): Devi
Originally published on Towards AI.
Part 2 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!
A beginner friendly introduction and application of RAG
As an amateur photographer, I am experimenting with ways I can use generative AI to get better at my craft. In this blog post, Iβll walk you through the process of creating a simple interactive question-answering application using Python, Gemini Flash Pro API, LangChain, and Gradio.
I recently got my first ever Fuji camera and decided to leverage RAG and Gemini to create a Fuji X-S20 Q & A app. This app will answer any question about the Fuji X-S20 camera (without you having to pour over the 400 page manual)!
📔This is a beginner-friendly tutorial so quick notes on Retrieval Augmented Generation (RAG) and LangChain before we get started with the hands-on.
Understanding how RAG works in 4 steps
RAG can be summarized in 4 sequential steps:
- Loading our data (aka data ingestion)
- Breaking our input data into smaller chunks
- Creating Embeddings (of your choice)
- Storing embeddings in vector database (of your choice)
Understanding how LangChain works
LangChain is an open-source framework designed to easily build applications that use language models.
In particular, it helps with the management of embeddings, interactions with vector databases, and integration with various data sources from the 4 step process process. To bring it all together, Iβve sketched the below pic, with respect to how LangChain is used in our Q & A app.
A 11 Step Breakdown of Building our App
1. Creating a new conda environment and importing the required modules
If you are working on your computer, create a separate conda environment for running LangChain projects and call it something like βenv_langchainβ, so that it is clear. You can reuse this environment for any other LangChain project you create, in the future, as needed. Load all the libraries and necessary modules from the requirements.txt file. This step can take the most amount of time and patience, but once you get this right, everything else will be easy.
I chose VS Code as my IDE for this project. On Windows, pressing βCtrl+Shift+Pβ allows you to select the newly created βenv_langchainβ environment as the interpreter. To start working in this environment, type conda activate env_langchain
in the terminal.
The requirements.txt
file lists the libraries that need to be installed to run this program successfully
Modules and their purpose illustrated with comments below.
import os # Interacts with the operating system
import gradio as gr # Web framework for interactive applications
from langchain_community.document_loaders import PyPDFLoader # Loads and parses PDF documents
from langchain.text_splitter import RecursiveCharacterTextSplitter # Splits text into manageable chunks
from langchain_chroma import Chroma # Manages vector stores for document embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings # Generates embeddings using Googleβs Generative AI
from dotenv import load_dotenv # Loads environment variables from .env file
from langchain_google_genai import ChatGoogleGenerativeAI # For conversational AI
from langchain.chains import create_retrieval_chain # Creates retrieval chains
from langchain.chains.combine_documents import create_stuff_documents_chain # Combines document processing
from langchain_core.prompts import ChatPromptTemplate # For creating prompt templates
2. Loading Environment Variables
Weβll load environment sensitive variables (like our Google API key) using dotenv
. file in your IDE (like VS Code). Make sure to create a .env
file in your project directory with your API key, which you can retrieve for free from here: https://aistudio.google.com/app/u/1/apikey
3. Loading the PDF Manual
Next, weβll load our PDF document, which in my case, was the 400 page Fuji XS20 camera manual: https://fujifilm-dsc.com/en-int/manual/x-s20/x-s20_manual_en_s_f.pdf
pythonCopy codloader = PyPDFLoader("Fuji_xs20_manual.pdf") # Initialize the PDF loader
data = loader.load() # Load the entire PDF as a single Document
4. Splitting the Text
Since the document can be large, we split the text into manageable chunks ahead of setting up embeddings:
pythonCopy codetext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) # Initialize the text splitter
docs = text_splitter.split_documents(data) # Split the loaded data into smaller documents
print("Total number of documents: ", len(docs)) # Display the total number of documents
5. Setting Up the Embeddings
We will now set up the embedding model and create a vector store:
embeddings = GoogleGenerativeAIEmbeddings(api_key=api_key, model="models/embedding-001") # Initialize the embedding model
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings) # Create a Chroma vector store
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10}) # Set up the retriever
6. Configuring the Language Model
Next, we configure the language model that will answer our questions:
llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3, max_tokens=500) # Initialize the language model
7. Creating the Prompt Template
We define how the model will respond to queries:
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt), # System-level instructions
("human", "{input}"), # User input placeholder
]
)
8. Creating the Question-Answering Chain
Now we can create the chain that will process user queries:
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain) # Create a Retrieval-Augmented Generation (RAG) chain
9. Handling User Queries
We define a function to process user queries:
def answer_query(query):
if query:
response = rag_chain.invoke({"input": query}) # Invoke the RAG chain with the user query
return response["answer"] # Return the answer from the assistant
10. Setting Up the Gradio Interface
Finally, we set up a simple web interface using Gradio:
iface = gr.Interface(
fn=answer_query, # Function to call for generating the response
inputs="text", # Input type for the query
outputs="text", # Output type for the answer
title="RAG Application Built on Gemini Model",
description="Ask any question about your Fuji X-S20 camera"
)
# Launch the Gradio app
if __name__ == "__main__":
iface.launch()
Run the Python Script: In the terminal, navigate to the folder containing your app.py
file (if you're not already in the right directory) and type the following command:
python app.py
Conclusion
You now have a fully functional question-answering application that utilizes the power of Gemini, LangChain and Gradio! You can ask any questions about the Fuji X-S20 camera, and the app will respond based on the information extracted from the PDF manual.
Feel free to customize and expand this project by integrating more documents or different types of user interfaces. Happy coding!
Note: Gradio gives you a shareable public link but, it expires within 75 hours.
Stay tuned! Follow me on Medium for more AI and Cloud content
🎨🎨🎨βI am still learning.β β Michelangelo (at 87)🖌οΈ🖼οΈ🖌οΈ
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI