Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio

Author(s): Devi

Originally published on Towards AI.

Part 2 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

A beginner friendly introduction and application of RAG

As an amateur photographer, I am experimenting with ways I can use generative AI to get better at my craft. In this blog post, I’ll walk you through the process of creating a simple interactive question-answering application using Python, Gemini Flash Pro API, LangChain, and Gradio.

I recently got my first ever Fuji camera and decided to leverage RAG and Gemini to create a Fuji X-S20 Q & A app. This app will answer any question about the Fuji X-S20 camera (without you having to pour over the 400 page manual)!

📔This is a beginner-friendly tutorial so quick notes on Retrieval Augmented Generation (RAG) and LangChain before we get started with the hands-on.

Understanding how RAG works in 4 steps

My quick sketch of the 4-step RAG process

RAG can be summarized in 4 sequential steps:

Loading our data (aka data ingestion)
Breaking our input data into smaller chunks
Creating Embeddings (of your choice)
Storing embeddings in vector database (of your choice)

Understanding how LangChain works

LangChain is an open-source framework designed to easily build applications that use language models.

In particular, it helps with the management of embeddings, interactions with vector databases, and integration with various data sources from the 4 step process process. To bring it all together, I’ve sketched the below pic, with respect to how LangChain is used in our Q & A app.

My quick sketch of how LangChain helps with our PDF reader app

A 11 Step Breakdown of Building our App

1. Creating a new conda environment and importing the required modules

If you are working on your computer, create a separate conda environment for running LangChain projects and call it something like ‘env_langchain’, so that it is clear. You can reuse this environment for any other LangChain project you create, in the future, as needed. Load all the libraries and necessary modules from the requirements.txt file. This step can take the most amount of time and patience, but once you get this right, everything else will be easy.

I chose VS Code as my IDE for this project. On Windows, pressing ‘Ctrl+Shift+P’ allows you to select the newly created ‘env_langchain’ environment as the interpreter. To start working in this environment, type conda activate env_langchain in the terminal.

The requirements.txt file lists the libraries that need to be installed to run this program successfully

On ‘conda activate’ command, your VS code terminal, should change to this (env_langchain1)

Modules and their purpose illustrated with comments below.

import os # Interacts with the operating system
import gradio as gr # Web framework for interactive applications
from langchain_community.document_loaders import PyPDFLoader # Loads and parses PDF documents
from langchain.text_splitter import RecursiveCharacterTextSplitter # Splits text into manageable chunks
from langchain_chroma import Chroma # Manages vector stores for document embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings # Generates embeddings using Google’s Generative AI
from dotenv import load_dotenv # Loads environment variables from .env file
from langchain_google_genai import ChatGoogleGenerativeAI # For conversational AI
from langchain.chains import create_retrieval_chain # Creates retrieval chains
from langchain.chains.combine_documents import create_stuff_documents_chain # Combines document processing
from langchain_core.prompts import ChatPromptTemplate # For creating prompt templates

2. Loading Environment Variables

We’ll load environment sensitive variables (like our Google API key) using dotenv. file in your IDE (like VS Code). Make sure to create a .env file in your project directory with your API key, which you can retrieve for free from here: https://aistudio.google.com/app/u/1/apikey

3. Loading the PDF Manual

Next, we’ll load our PDF document, which in my case, was the 400 page Fuji XS20 camera manual: https://fujifilm-dsc.com/en-int/manual/x-s20/x-s20_manual_en_s_f.pdf

pythonCopy codloader = PyPDFLoader("Fuji_xs20_manual.pdf") # Initialize the PDF loader
data = loader.load() # Load the entire PDF as a single Document

4. Splitting the Text

Since the document can be large, we split the text into manageable chunks ahead of setting up embeddings:

pythonCopy codetext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) # Initialize the text splitter
docs = text_splitter.split_documents(data) # Split the loaded data into smaller documents
print("Total number of documents: ", len(docs)) # Display the total number of documents

5. Setting Up the Embeddings

We will now set up the embedding model and create a vector store:

embeddings = GoogleGenerativeAIEmbeddings(api_key=api_key, model="models/embedding-001") # Initialize the embedding model
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings) # Create a Chroma vector store
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10}) # Set up the retriever

6. Configuring the Language Model

Next, we configure the language model that will answer our questions:

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3, max_tokens=500) # Initialize the language model

7. Creating the Prompt Template

We define how the model will respond to queries:

system_prompt = (
 "You are an assistant for question-answering tasks. "
 "Use the following pieces of retrieved context to answer "
 "the question. If you don't know the answer, say that you "
 "don't know. Use three sentences maximum and keep the "
 "answer concise."
 "\n\n"
 "{context}"
)

prompt = ChatPromptTemplate.from_messages(
 [
 ("system", system_prompt), # System-level instructions
 ("human", "{input}"), # User input placeholder
 ]
)

8. Creating the Question-Answering Chain

Now we can create the chain that will process user queries:

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain) # Create a Retrieval-Augmented Generation (RAG) chain

9. Handling User Queries

We define a function to process user queries:

def answer_query(query):
 if query:
 response = rag_chain.invoke({"input": query}) # Invoke the RAG chain with the user query
 return response["answer"] # Return the answer from the assistant

10. Setting Up the Gradio Interface

Finally, we set up a simple web interface using Gradio:

iface = gr.Interface(
 fn=answer_query, # Function to call for generating the response
 inputs="text", # Input type for the query
 outputs="text", # Output type for the answer
 title="RAG Application Built on Gemini Model",
 description="Ask any question about your Fuji X-S20 camera"
)

# Launch the Gradio app
if __name__ == "__main__":
 iface.launch()

Run the Python Script: In the terminal, navigate to the folder containing your app.py file (if you're not already in the right directory) and type the following command:

python app.py

Gradio interface where you can get your Fuji XS20 questions answered!

Conclusion

You now have a fully functional question-answering application that utilizes the power of Gemini, LangChain and Gradio! You can ask any questions about the Fuji X-S20 camera, and the app will respond based on the information extracted from the PDF manual.

Feel free to customize and expand this project by integrating more documents or different types of user interfaces. Happy coding!

Note: Gradio gives you a shareable public link but, it expires within 75 hours.

Stay tuned! Follow me on Medium for more AI and Cloud content

🎨🎨🎨“I am still learning.” — Michelangelo (at 87)🖌️🖼️🖌️

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio

Author(s): Devi

Part 2 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

A beginner friendly introduction and application of RAG

Understanding how RAG works in 4 steps

Understanding how LangChain works

A 11 Step Breakdown of Building our App

1. Creating a new conda environment and importing the required modules

2. Loading Environment Variables

3. Loading the PDF Manual

4. Splitting the Text

5. Setting Up the Embeddings

6. Configuring the Language Model

7. Creating the Prompt Template

8. Creating the Question-Answering Chain

9. Handling User Queries

10. Setting Up the Gradio Interface

Conclusion

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All — and How I Took It to the Next Level)

Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices

Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama

Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence

ChatGPT Gets Windows App

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio

Author(s): Devi

Part 2 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

A beginner friendly introduction and application of RAG

Understanding how RAG works in 4 steps

Understanding how LangChain works

A 11 Step Breakdown of Building our App

1. Creating a new conda environment and importing the required modules

2. Loading Environment Variables

3. Loading the PDF Manual

4. Splitting the Text

5. Setting Up the Embeddings

6. Configuring the Language Model

7. Creating the Prompt Template

8. Creating the Question-Answering Chain

9. Handling User Queries

10. Setting Up the Gradio Interface

Conclusion

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement