A Primer on Using Google’s Gemini API to Improve Your Photography
Last Updated on November 3, 2024 by Editorial Team
Author(s): Devi
Originally published on Towards AI.
Part 1 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!
In this blog post, I’ll show you how to build a Photo Critique and Enhancement App using Google’s Gemini-1.5-Flash-8B API and Streamlit (all for free!).
As a product manager turned builder, I began my AI journey two years ago. Things became especially exciting when I started building simple apps that served my specific use cases — which in this case, was to get myself a photo critique assistant (or mentor?) to help me get better at photography!
Whether you’re an amateur photographer looking to sharpen your skills or someone curious about AI, this tutorial will walk you through the essentials of Gemini API inferencing.
By the end, you would have built an app that not only critiques your photos but also helps you improve them! The goal is to explore generative AI and its diverse applications, enhancing your craft — whether in photography, music, art, or even cooking.
Overview of Google Flash API
Gemini 1.5-Flash-8B is designed for speed and efficiency and is multimodal, meaning it can accept text, image, and video inputs and return text-based feedback.
Accessing Google AI Studio
Google AI Studio is your starting point — an easy-to-access platform that allows you to work with Google’s generative models. It’s free to use as long as you have a Gmail account.
- Go to Google AI Studio.
- Sign in with your Gmail account to get started. Once inside, you can select the Gemini model, ensuring you have an API key to integrate it with your app.
Training, Inference, and Fine-Tuning in AI
Before we jump into the coding process, it’s essential to understand some foundational concepts in AI. When I first began my AI journey, I quickly recognized how crucial it is to translate these foundational topics into practical applications.
- Training: During the training phase, an AI model learns to recognize patterns from a large, diverse dataset. The Gemini models we will be accessing has been developed by Google’s DeepMind team and made available for use. Thanks DeepMind team for getting the training done and making the model ready for our use!
- Inference: This is the model’s “active” phase, where it accepts input and generates output based on user requests. We will harness the power of inference to build this photo critique app.
💡Inference is the secret sauce of our photo critique app!
- Fine-Tuning: Fine-tuning specializes the pre-trained model by training it on niche datasets. This process enhances the model’s ability to provide more relevant outputs for specific tasks. This is what we can do to further tweak the OG model (as needed) for specialized use cases.
To wrap up this section, this tutorial focuses on inference using Google’s trained multimodal model — Gemini Flash 1.5 Flash-8B. It will accept your photo and provide feedback on how to improve it.
Building the Photo Critique App with Streamlit
This section walks you through setting up your coding environment and configuring the Gemini model in 5 simple steps.
Step 1: Environment Setup
- Import Libraries: First, import essential libraries like
os
,load_dotenv
,google.generativeai
,streamlit
, andImage
. These will handle everything from environment variables to interacting with the AI model. - Load API Key: Using
load_dotenv()
, load your API key from a.env
file. This keeps your key secure and ready for use.
import os
from dotenv import load_dotenv
import google.generativeai as genai
import streamlit as st
from PIL import Image
# Load environment variables
load_dotenv()
Step 2: Configuring API Key
- Retrieve and Set API Key: Retrieve the API key with
os.getenv('API_KEY')
.
Configure the AI Model: Next, set up Google’s Generative AI SDK with genai.configure(api_key=API_KEY)
for a smooth integration with your app.
# Set up your API key
API_KEY = os.getenv('API_KEY')
if not API_KEY:
raise ValueError("No API key found. Please set API_KEY in your .env file.")
# Initialize the Generative Model
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash-8b")
Step 3: Defining Functions for Critique
- Image Processing: Define a function to upload and format the image for Gemini API.
- Generate Critique Function: Define a function to send the processed image and a critique prompt to the model, receiving insights as feedback.
def get_gemini_response(input_prompt, image):
response = model.generate_content(
[input_prompt, image[0]]
)
return response.text
def get_image_content(uploaded_file):
if uploaded_file is not None:
image_byte_data = uploaded_file.getvalue()
image_parts = [{
"mime_type": uploaded_file.type,
"data": image_byte_data
}]
return image_parts
else:
raise FileNotFoundError("File not uploaded")
Step 4: Setting Up the Streamlit Interface
Streamlit lets you build a clean user interface and test your app’s capability.
# Set Streamlit page configuration FIRST
st.set_page_config(page_title="PhotoCritique", layout="centered")
# Streamlit interface setup
st.markdown("<h1 style='text-align: center;'>PhotoCritique App</h1>", unsafe_allow_html=True)
# Sidebar for critique options
st.sidebar.header("Critique Options")
# Allow users to select which aspects they want feedback on
aspects = st.sidebar.multiselect(
"Select any 3 aspects to critique:",
options=["Composition", "Lighting", "Focus and Sharpness", "Exposure", "Color Balance", "Creativity and Impact"],
default=["Composition", "Lighting", "Focus and Sharpness"]
)
# Ensure the user selects exactly three aspects
if len(aspects) != 3:
st.sidebar.warning("Please select exactly 3 aspects.")
# File uploader
uploaded_file = st.file_uploader("Upload a Photo for Critique", type=["jpg", "png", "jpeg"])
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Photo", use_column_width=True)
submit = st.button("Get Critique")
# Construct the input prompt based on selected aspects
if submit:
if len(aspects) == 3:
try:
image_data = get_image_content(uploaded_file)
# Create a formatted list of aspects
aspects_list = "\n".join([f"- {aspect}" for aspect in aspects])
# Instruction for feedback length
feedback_instruction = f"Provide concise and actionable feedback for each selected aspect. Limit each section to {feedback_length} sentences."
# Construct the prompt
input_prompt = f"""
You are an expert professional photographer. Please critique the uploaded photo focusing on the following aspects:
{aspects_list}
{feedback_instruction}
Provide three critique areas and three areas for improvement based on the selected aspects.
Format the response as follows:
**Critique Areas:**
1.
2.
3.
**Areas for Improvement:**
1.
2.
3.
"""
# Get the response from Gemini
response = get_gemini_response(input_prompt, image_data)
# Display the response with formatting
st.subheader("Photo Critique")
st.write(response)
except FileNotFoundError as e:
st.error(str(e))
except Exception as e:
st.error(f"An error occurred: {e}")
else:
st.error("Please select exactly 3 aspects for the critique.")
Step 5: Run your app!
To run the app locally, type the following command in your terminal:
streamlit run app.py
This will open a new tab in your browser where you can upload a photo and receive AI-generated feedback.
Conclusion
The Photo Critique app offers more than just feedback; it guides you on a journey of improvement! I hope it has also taught you a thing or two about Gemini model inferencing and Streamlit along the way! 😉
Stay tuned! Follow my Medium page for more tutorials on AI and Cloud.
🌟🌟🌟 Shine bright like a diamond 💎💎💎
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI