Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

A Primer on Using Google’s Gemini API to Improve Your Photography
Latest   Machine Learning

A Primer on Using Google’s Gemini API to Improve Your Photography

Last Updated on November 3, 2024 by Editorial Team

Author(s): Devi

Originally published on Towards AI.

Part 1 of a 2-part beginner series exploring fun generative AI use cases with Gemini to enhance your photography skills!

In this blog post, I’ll show you how to build a Photo Critique and Enhancement App using Google’s Gemini-1.5-Flash-8B API and Streamlit (all for free!).

Photo by Robert Shunev on Unsplash

As a product manager turned builder, I began my AI journey two years ago. Things became especially exciting when I started building simple apps that served my specific use cases — which in this case, was to get myself a photo critique assistant (or mentor?) to help me get better at photography!

Whether you’re an amateur photographer looking to sharpen your skills or someone curious about AI, this tutorial will walk you through the essentials of Gemini API inferencing.

By the end, you would have built an app that not only critiques your photos but also helps you improve them! The goal is to explore generative AI and its diverse applications, enhancing your craft — whether in photography, music, art, or even cooking.

Overview of Google Flash API

Gemini 1.5-Flash-8B is designed for speed and efficiency and is multimodal, meaning it can accept text, image, and video inputs and return text-based feedback.

Accessing Google AI Studio

Google AI Studio is your starting point — an easy-to-access platform that allows you to work with Google’s generative models. It’s free to use as long as you have a Gmail account.

  1. Go to Google AI Studio.
  2. Sign in with your Gmail account to get started. Once inside, you can select the Gemini model, ensuring you have an API key to integrate it with your app.

Training, Inference, and Fine-Tuning in AI

Before we jump into the coding process, it’s essential to understand some foundational concepts in AI. When I first began my AI journey, I quickly recognized how crucial it is to translate these foundational topics into practical applications.

  • Training: During the training phase, an AI model learns to recognize patterns from a large, diverse dataset. The Gemini models we will be accessing has been developed by Google’s DeepMind team and made available for use. Thanks DeepMind team for getting the training done and making the model ready for our use!
  • Inference: This is the model’s “active” phase, where it accepts input and generates output based on user requests. We will harness the power of inference to build this photo critique app.

💡Inference is the secret sauce of our photo critique app!

  • Fine-Tuning: Fine-tuning specializes the pre-trained model by training it on niche datasets. This process enhances the model’s ability to provide more relevant outputs for specific tasks. This is what we can do to further tweak the OG model (as needed) for specialized use cases.

To wrap up this section, this tutorial focuses on inference using Google’s trained multimodal model — Gemini Flash 1.5 Flash-8B. It will accept your photo and provide feedback on how to improve it.

Building the Photo Critique App with Streamlit

This section walks you through setting up your coding environment and configuring the Gemini model in 5 simple steps.

Step 1: Environment Setup

  • Import Libraries: First, import essential libraries like os, load_dotenv, google.generativeai, streamlit, and Image. These will handle everything from environment variables to interacting with the AI model.
  • Load API Key: Using load_dotenv(), load your API key from a .env file. This keeps your key secure and ready for use.
import os
from dotenv import load_dotenv
import google.generativeai as genai
import streamlit as st
from PIL import Image
# Load environment variables
load_dotenv()

Step 2: Configuring API Key

  • Retrieve and Set API Key: Retrieve the API key with os.getenv('API_KEY').

Configure the AI Model: Next, set up Google’s Generative AI SDK with genai.configure(api_key=API_KEY) for a smooth integration with your app.

# Set up your API key
API_KEY = os.getenv('API_KEY')
if not API_KEY:
raise ValueError("No API key found. Please set API_KEY in your .env file.")

# Initialize the Generative Model
genai.configure(api_key=API_KEY)
model = genai.GenerativeModel("gemini-1.5-flash-8b")

Step 3: Defining Functions for Critique

  • Image Processing: Define a function to upload and format the image for Gemini API.
  • Generate Critique Function: Define a function to send the processed image and a critique prompt to the model, receiving insights as feedback.
def get_gemini_response(input_prompt, image):
response = model.generate_content(
[input_prompt, image[0]]
)
return response.text

def get_image_content(uploaded_file):
if uploaded_file is not None:
image_byte_data = uploaded_file.getvalue()
image_parts = [{
"mime_type": uploaded_file.type,
"data": image_byte_data
}]
return image_parts
else:
raise FileNotFoundError("File not uploaded")

Step 4: Setting Up the Streamlit Interface

Streamlit lets you build a clean user interface and test your app’s capability.

# Set Streamlit page configuration FIRST
st.set_page_config(page_title="PhotoCritique", layout="centered")

# Streamlit interface setup
st.markdown("<h1 style='text-align: center;'>PhotoCritique App</h1>", unsafe_allow_html=True)

# Sidebar for critique options
st.sidebar.header("Critique Options")

# Allow users to select which aspects they want feedback on
aspects = st.sidebar.multiselect(
"Select any 3 aspects to critique:",
options=["Composition", "Lighting", "Focus and Sharpness", "Exposure", "Color Balance", "Creativity and Impact"],
default=["Composition", "Lighting", "Focus and Sharpness"]
)
# Ensure the user selects exactly three aspects
if len(aspects) != 3:
st.sidebar.warning("Please select exactly 3 aspects.")

# File uploader
uploaded_file = st.file_uploader("Upload a Photo for Critique", type=["jpg", "png", "jpeg"])
if uploaded_file is not None:
image = Image.open(uploaded_file)
st.image(image, caption="Uploaded Photo", use_column_width=True)

submit = st.button("Get Critique")

# Construct the input prompt based on selected aspects
if submit:
if len(aspects) == 3:
try:
image_data = get_image_content(uploaded_file)

# Create a formatted list of aspects
aspects_list = "\n".join([f"- {aspect}" for aspect in aspects])

# Instruction for feedback length
feedback_instruction = f"Provide concise and actionable feedback for each selected aspect. Limit each section to {feedback_length} sentences."

# Construct the prompt
input_prompt = f"""
You are an expert professional photographer. Please critique the uploaded photo focusing on the following aspects:
{aspects_list}

{feedback_instruction}

Provide three critique areas and three areas for improvement based on the selected aspects.
Format the response as follows:

**Critique Areas:**
1.
2.
3.

**Areas for Improvement:**
1.
2.
3.
"""


# Get the response from Gemini
response = get_gemini_response(input_prompt, image_data)

# Display the response with formatting
st.subheader("Photo Critique")
st.write(response)

except FileNotFoundError as e:
st.error(str(e))
except Exception as e:
st.error(f"An error occurred: {e}")
else:
st.error("Please select exactly 3 aspects for the critique.")

Step 5: Run your app!

To run the app locally, type the following command in your terminal:

streamlit run app.py

This will open a new tab in your browser where you can upload a photo and receive AI-generated feedback.

Submit your pic and click on ‘Get Critique’ button
Detailed feedback and areas for improvement from Gemini model on my pup photo! 🙂

Conclusion

The Photo Critique app offers more than just feedback; it guides you on a journey of improvement! I hope it has also taught you a thing or two about Gemini model inferencing and Streamlit along the way! 😉

Stay tuned! Follow my Medium page for more tutorials on AI and Cloud.

🌟🌟🌟 Shine bright like a diamond 💎💎💎

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Feedback ↓