Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio
Latest   Machine Learning

Building a Fuji X-S20 Camera Q&A App with Gemini, LangChain and Gradio

Author(s): Devi

Originally published on Towards AI.

Part 2 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

A beginner friendly introduction and application of RAG

As an amateur photographer, I am experimenting with ways I can use generative AI to get better at my craft. In this blog post, I’ll walk you through the process of creating a simple interactive question-answering application using Python, Gemini Flash Pro API, LangChain, and Gradio.

I recently got my first ever Fuji camera and decided to leverage RAG and Gemini to create a Fuji X-S20 Q & A app. This app will answer any question about the Fuji X-S20 camera (without you having to pour over the 400 page manual)!

📔This is a beginner-friendly tutorial so quick notes on Retrieval Augmented Generation (RAG) and LangChain before we get started with the hands-on.

Understanding how RAG works in 4 steps

My quick sketch of the 4-step RAG process

RAG can be summarized in 4 sequential steps:

  1. Loading our data (aka data ingestion)
  2. Breaking our input data into smaller chunks
  3. Creating Embeddings (of your choice)
  4. Storing embeddings in vector database (of your choice)

Understanding how LangChain works

LangChain is an open-source framework designed to easily build applications that use language models.

In particular, it helps with the management of embeddings, interactions with vector databases, and integration with various data sources from the 4 step process process. To bring it all together, I’ve sketched the below pic, with respect to how LangChain is used in our Q & A app.

My quick sketch of how LangChain helps with our PDF reader app

A 11 Step Breakdown of Building our App

1. Creating a new conda environment and importing the required modules

If you are working on your computer, create a separate conda environment for running LangChain projects and call it something like β€˜env_langchain’, so that it is clear. You can reuse this environment for any other LangChain project you create, in the future, as needed. Load all the libraries and necessary modules from the requirements.txt file. This step can take the most amount of time and patience, but once you get this right, everything else will be easy.

I chose VS Code as my IDE for this project. On Windows, pressing β€˜Ctrl+Shift+P’ allows you to select the newly created β€˜env_langchain’ environment as the interpreter. To start working in this environment, type conda activate env_langchain in the terminal.

The requirements.txt file lists the libraries that need to be installed to run this program successfully

On β€˜conda activate’ command, your VS code terminal, should change to this (env_langchain1)

Modules and their purpose illustrated with comments below.

import os # Interacts with the operating system
import gradio as gr # Web framework for interactive applications
from langchain_community.document_loaders import PyPDFLoader # Loads and parses PDF documents
from langchain.text_splitter import RecursiveCharacterTextSplitter # Splits text into manageable chunks
from langchain_chroma import Chroma # Manages vector stores for document embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings # Generates embeddings using Google’s Generative AI
from dotenv import load_dotenv # Loads environment variables from .env file
from langchain_google_genai import ChatGoogleGenerativeAI # For conversational AI
from langchain.chains import create_retrieval_chain # Creates retrieval chains
from langchain.chains.combine_documents import create_stuff_documents_chain # Combines document processing
from langchain_core.prompts import ChatPromptTemplate # For creating prompt templates

2. Loading Environment Variables

We’ll load environment sensitive variables (like our Google API key) using dotenv. file in your IDE (like VS Code). Make sure to create a .env file in your project directory with your API key, which you can retrieve for free from here: https://aistudio.google.com/app/u/1/apikey

3. Loading the PDF Manual

Next, we’ll load our PDF document, which in my case, was the 400 page Fuji XS20 camera manual: https://fujifilm-dsc.com/en-int/manual/x-s20/x-s20_manual_en_s_f.pdf

pythonCopy codloader = PyPDFLoader("Fuji_xs20_manual.pdf") # Initialize the PDF loader
data = loader.load() # Load the entire PDF as a single Document

4. Splitting the Text

Since the document can be large, we split the text into manageable chunks ahead of setting up embeddings:

pythonCopy codetext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000) # Initialize the text splitter
docs = text_splitter.split_documents(data) # Split the loaded data into smaller documents
print("Total number of documents: ", len(docs)) # Display the total number of documents

5. Setting Up the Embeddings

We will now set up the embedding model and create a vector store:

embeddings = GoogleGenerativeAIEmbeddings(api_key=api_key, model="models/embedding-001") # Initialize the embedding model
vectorstore = Chroma.from_documents(documents=docs, embedding=embeddings) # Create a Chroma vector store
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 10}) # Set up the retriever

6. Configuring the Language Model

Next, we configure the language model that will answer our questions:

llm = ChatGoogleGenerativeAI(model="gemini-1.5-pro", temperature=0.3, max_tokens=500) # Initialize the language model

7. Creating the Prompt Template

We define how the model will respond to queries:

system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt), # System-level instructions
("human", "{input}"), # User input placeholder
]
)

8. Creating the Question-Answering Chain

Now we can create the chain that will process user queries:

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain) # Create a Retrieval-Augmented Generation (RAG) chain

9. Handling User Queries

We define a function to process user queries:

def answer_query(query):
if query:
response = rag_chain.invoke({"input": query}) # Invoke the RAG chain with the user query
return response["answer"] # Return the answer from the assistant

10. Setting Up the Gradio Interface

Finally, we set up a simple web interface using Gradio:

iface = gr.Interface(
fn=answer_query, # Function to call for generating the response
inputs="text", # Input type for the query
outputs="text", # Output type for the answer
title="RAG Application Built on Gemini Model",
description="Ask any question about your Fuji X-S20 camera"
)
# Launch the Gradio app
if __name__ == "__main__":
iface.launch()

Run the Python Script: In the terminal, navigate to the folder containing your app.py file (if you're not already in the right directory) and type the following command:

python app.py
Gradio interface where you can get your Fuji XS20 questions answered!

Conclusion

You now have a fully functional question-answering application that utilizes the power of Gemini, LangChain and Gradio! You can ask any questions about the Fuji X-S20 camera, and the app will respond based on the information extracted from the PDF manual.

Feel free to customize and expand this project by integrating more documents or different types of user interfaces. Happy coding!

Note: Gradio gives you a shareable public link but, it expires within 75 hours.

Stay tuned! Follow me on Medium for more AI and Cloud content

🎨🎨🎨β€œI am still learning.” β€” Michelangelo (at 87)🖌️🖼️🖌️

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓