A Primer on Using Google’s Gemini API to Improve Your Photography

Last Updated on November 3, 2024 by Editorial Team

Author(s): Devi

Originally published on Towards AI.

Part 1 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

In this blog post, I’ll show you how to build a Photo Critique and Enhancement App using Google’s Gemini-1.5-Flash-8B API and Streamlit (all for free!).

As a product manager turned builder, I began my AI journey two years ago. Things became especially exciting when I started building simple apps that served my specific use cases — which in this case, was to get myself a photo critique assistant (or mentor?) to help me get better at photography!

Whether you’re an amateur photographer looking to sharpen your skills or someone curious about AI, this tutorial will walk you through the essentials of Gemini API inferencing.

By the end, you would have built an app that not only critiques your photos but also helps you improve them! The goal is to explore generative AI and its diverse applications, enhancing your craft — whether in photography, music, art, or even cooking.

Overview of Google Flash API

Gemini 1.5-Flash-8B is designed for speed and efficiency and is multimodal, meaning it can accept text, image, and video inputs and return text-based feedback.

Accessing Google AI Studio

Google AI Studio is your starting point — an easy-to-access platform that allows you to work with Google’s generative models. It’s free to use as long as you have a Gmail account.

Go to Google AI Studio.
Sign in with your Gmail account to get started. Once inside, you can select the Gemini model, ensuring you have an API key to integrate it with your app.

Training, Inference, and Fine-Tuning in AI

Before we jump into the coding process, it’s essential to understand some foundational concepts in AI. When I first began my AI journey, I quickly recognized how crucial it is to translate these foundational topics into practical applications.

Training: During the training phase, an AI model learns to recognize patterns from a large, diverse dataset. The Gemini models we will be accessing has been developed by Google’s DeepMind team and made available for use. Thanks DeepMind team for getting the training done and making the model ready for our use!
Inference: This is the model’s “active” phase, where it accepts input and generates output based on user requests. We will harness the power of inference to build this photo critique app.

💡Inference is the secret sauce of our photo critique app!

Fine-Tuning: Fine-tuning specializes the pre-trained model by training it on niche datasets. This process enhances the model’s ability to provide more relevant outputs for specific tasks. This is what we can do to further tweak the OG model (as needed) for specialized use cases.

To wrap up this section, this tutorial focuses on inference using Google’s trained multimodal model — Gemini Flash 1.5 Flash-8B. It will accept your photo and provide feedback on how to improve it.

Building the Photo Critique App with Streamlit

This section walks you through setting up your coding environment and configuring the Gemini model in 5 simple steps.

Step 1: Environment Setup

Import Libraries: First, import essential libraries like os, load_dotenv, google.generativeai, streamlit, and Image. These will handle everything from environment variables to interacting with the AI model.
Load API Key: Using load_dotenv(), load your API key from a .env file. This keeps your key secure and ready for use.

import os
from dotenv import load_dotenv
import google.generativeai as genai
import streamlit as st
from PIL import Image

# Load environment variables
load_dotenv()

Step 2: Configuring API Key

Retrieve and Set API Key: Retrieve the API key with os.getenv('API_KEY').

Configure the AI Model: Next, set up Google’s Generative AI SDK with genai.configure(api_key=API_KEY) for a smooth integration with your app.

# Set up your API key
API_KEY = os.getenv('API_KEY')
if not API_KEY:
 raise ValueError("No API key found. Please set API_KEY in your .env file.")

# Initialize the Generative Model
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash-8b")

Step 3: Defining Functions for Critique

Image Processing: Define a function to upload and format the image for Gemini API.
Generate Critique Function: Define a function to send the processed image and a critique prompt to the model, receiving insights as feedback.

def get_gemini_response(input_prompt, image):
 response = model.generate_content(
 [input_prompt, image[0]]
 )
 return response.text

def get_image_content(uploaded_file):
 if uploaded_file is not None:
 image_byte_data = uploaded_file.getvalue()
 image_parts = [{
 "mime_type": uploaded_file.type,
 "data": image_byte_data
 }]
 return image_parts
 else:
 raise FileNotFoundError("File not uploaded")

Step 4: Setting Up the Streamlit Interface

Streamlit lets you build a clean user interface and test your app’s capability.

# Set Streamlit page configuration FIRST
st.set_page_config(page_title="PhotoCritique", layout="centered")

# Streamlit interface setup
st.markdown("<h1 style='text-align: center;'>PhotoCritique App</h1>", unsafe_allow_html=True)

# Sidebar for critique options
st.sidebar.header("Critique Options")

# Allow users to select which aspects they want feedback on
aspects = st.sidebar.multiselect(
 "Select any 3 aspects to critique:",
 options=["Composition", "Lighting", "Focus and Sharpness", "Exposure", "Color Balance", "Creativity and Impact"],
 default=["Composition", "Lighting", "Focus and Sharpness"]
)
# Ensure the user selects exactly three aspects
if len(aspects) != 3:
 st.sidebar.warning("Please select exactly 3 aspects.")

# File uploader
uploaded_file = st.file_uploader("Upload a Photo for Critique", type=["jpg", "png", "jpeg"])
if uploaded_file is not None:
 image = Image.open(uploaded_file)
 st.image(image, caption="Uploaded Photo", use_column_width=True)

submit = st.button("Get Critique")

# Construct the input prompt based on selected aspects
if submit:
 if len(aspects) == 3:
 try:
 image_data = get_image_content(uploaded_file)

 # Create a formatted list of aspects
 aspects_list = "\n".join([f"- {aspect}" for aspect in aspects])

 # Instruction for feedback length
 feedback_instruction = f"Provide concise and actionable feedback for each selected aspect. Limit each section to {feedback_length} sentences."

 # Construct the prompt
 input_prompt = f"""
 You are an expert professional photographer. Please critique the uploaded photo focusing on the following aspects:
 {aspects_list}
 
 {feedback_instruction}
 
 Provide three critique areas and three areas for improvement based on the selected aspects.
 Format the response as follows:
 
 **Critique Areas:**
 1. 
 2. 
 3. 
 
 **Areas for Improvement:**
 1. 
 2. 
 3. 
 """

 # Get the response from Gemini
 response = get_gemini_response(input_prompt, image_data)

 # Display the response with formatting
 st.subheader("Photo Critique")
 st.write(response)

 except FileNotFoundError as e:
 st.error(str(e))
 except Exception as e:
 st.error(f"An error occurred: {e}")
 else:
 st.error("Please select exactly 3 aspects for the critique.")

Step 5: Run your app!

To run the app locally, type the following command in your terminal:

streamlit run app.py

This will open a new tab in your browser where you can upload a photo and receive AI-generated feedback.

Submit your pic and click on ‘Get Critique’ button

Detailed feedback and areas for improvement from Gemini model on my pup photo! 🙂

Conclusion

The Photo Critique app offers more than just feedback; it guides you on a journey of improvement! I hope it has also taught you a thing or two about Gemini model inferencing and Streamlit along the way! 😉

Stay tuned! Follow my Medium page for more tutorials on AI and Cloud.

🌟🌟🌟 Shine bright like a diamond 💎💎💎

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

A Primer on Using Google’s Gemini API to Improve Your Photography

Author(s): Devi

Part 1 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

Overview of Google Flash API

Accessing Google AI Studio

Training, Inference, and Fine-Tuning in AI

Building the Photo Critique App with Streamlit

Conclusion

🌟🌟🌟 Shine bright like a diamond 💎💎💎

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Secret to Unlocking Deeper SWOT Analysis with AI (The Code That Started It All — and How I Took It to the Next Level)

Evaluating and Monitoring LLM Agents: Tools, Metrics, and Best Practices

Building Multi-Agent AI Systems From Scratch: OpenAI vs. Ollama

Web-LLM Assistant: Bridging Local AI Models With Real-Time Web Intelligence

ChatGPT Gets Windows App

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

A Primer on Using Google’s Gemini API to Improve Your Photography

Author(s): Devi

Part 1 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

Overview of Google Flash API

Accessing Google AI Studio

Training, Inference, and Fine-Tuning in AI

Building the Photo Critique App with Streamlit

Conclusion

🌟🌟🌟 Shine bright like a diamond 💎💎💎

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement