Transform Image Data into Insights with VisualInsight’s AI Automation

Last Updated on January 7, 2025 by Editorial Team

Author(s): Yotam Braun

Originally published on Towards AI.

Extracting insights from images can often feel challenging. Whether you’re a researcher, an analyst, or simply curious, efficiently analyzing and understanding images is crucial but not always straightforward. This is where VisualInsight comes in.

Figure 1: https://developers.googleblog.com/en/gemini-15-flash-updates-google-ai-studio-gemini-api/

GitHub – yotambraun/VisualInsight

Contribute to yotambraun/VisualInsight development by creating an account on GitHub.

github.com

Challenges with Traditional Image Analysis Methods

Manual Effort: Finding the right tools, writing custom scripts, and working with large datasets often involves significant manual work.
Complexity: Navigating advanced algorithms, ML frameworks, or open-source projects can be overwhelming, especially for smaller teams.
Storage and Security: Ensuring data is securely stored and easily retrievable adds another layer of complexity.
Scaling: Handling larger datasets requires scalable infrastructure, which often involves high overhead.

VisualInsight addresses these challenges with a seamless and automated solution for image analysis.

Figure 2: Example of the user interface where you can upload images

As you can see, the UI helps to simplify the process. You just drag and drop your image — no complicated scripts required.

Introducing VisualInsight

Core Idea

VisualInsight is a Streamlit-based web application that simplifies image analysis using Google Generative AI (Gemini). It incorporates AWS S3 for secure storage of original images and results.

Figure 3: Analysis results displayed in the Streamlit application

By automating much of the heavy lifting, VisualInsight ensures you spend less time on configuration and more time on innovation.

Key Components

Streamlit UI: A user-friendly interface for uploading, viewing, and analyzing images.
LLM Service (Google Gemini): Advanced text-based insights derived from images.
AWS S3 Storage: Secure storage for files and AI-generated analyses.
Docker & Terraform: Infrastructure for quick deployments and reproducibility.
CI/CD via GitHub Actions: Automated builds, tests, and deployments for reliability.

How VisualInsight Works

Upload an Image
Drag and drop a JPG or PNG file onto the application.
AI Analysis with Google Gemini
The uploaded image is then passed to the LLMService class, which uses Google’s Generative AI (Gemini) to generate descriptive insights about the image content.

Figure 4: Further analysis details being displayed to the user

3. Storage in AWS S3 Once analyzed, the application uploads both the original image and any analysis results to an S3 bucket for safe-keeping.

4. Display Results Insights are displayed in the application interface for immediate feedback.

Figure 5: Another view of the analysis interface

Code Highlights

Below are some of the core services that power VisualInsight.

LLM Service (app/services/llm_service.py)

Handles the interaction with Google Gemini for image analysis.

import google.generativeai as genai
import os
from datetime import datetime
from PIL import Image
from utils.logger import setup_logger

logger = setup_logger()

class LLMService:
 def __init__(self):
 genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
 self.model = genai.GenerativeModel('gemini-1.5-flash-002')
 
 self.prompt = """
 Analyze this Image and provide:
 1. Image type
 2. Key information
 3. Important details
 4. Notable observations
 """

 def analyze_document(self, image: Image.Image) -> dict:
 try:
 logger.info("Sending request to LLM")
 # Generate content directly with the PIL image
 response = self.model.generate_content([
 self.prompt, 
 image
 ])
 
 return {
 "analysis": response.text,
 "timestamp": datetime.now().isoformat()
 }
 
 except Exception as e:
 logger.error(f"LLM analysis failed: {str(e)}")
 raise Exception(f"Failed to analyze document: {str(e)}")

What’s Happening Here?

I configure our Google Generative AI (Gemini) with an API key.
A default prompt outlines the kind of analysis we want.
The analyze_document method sends the image to Gemini and returns its text-based analysis.

2. S3 Service (app/services/s3_service.py)

Uploads files to AWS S3 with timestamped keys and generates presigned URLs for private access.

import boto3
import os
from datetime import datetime
from utils.logger import setup_logger

logger = setup_logger()

class S3Service:
 def __init__(self):
 self.s3_client = boto3.client(
 's3',
 aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
 aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
 region_name=os.getenv('AWS_REGION', 'us-east-1')
 )
 self.bucket_name = os.getenv('S3_BUCKET_NAME')

 def upload_file(self, file):
 """Upload file to S3 and return the URL"""
 try:
 # Generate unique filename
 timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
 file_key = f"uploads/{timestamp}_{file.name}"
 
 # Upload to S3
 self.s3_client.upload_fileobj(
 file,
 self.bucket_name,
 file_key
 )
 
 # Generate presigned URL that expires in 1 hour
 url = self.s3_client.generate_presigned_url(
 'get_object',
 Params={
 'Bucket': self.bucket_name,
 'Key': file_key
 },
 ExpiresIn=3600
 )
 
 logger.info(f"File uploaded successfully: {url}")
 return url
 
 except Exception as e:
 logger.error(f"S3 upload failed: {str(e)}")
 raise Exception(f"Failed to upload file to S3: {str(e)}")

Figure 6: The AWS S3 bucket that stores uploaded images and analysis results

Core Features:

Uses boto3 to interact with AWS S3.
Generates a time-stamped key for each file.
Creates a presigned URL for private file access without requiring you to open up the entire bucket.

3. The Streamlit Application (app/main.py)

Provides the user interface for file uploads, analysis initiation, and displaying results.

import streamlit as st
import os
from dotenv import load_dotenv
from services.s3_service import S3Service
from services.llm_service import LLMService
from utils.logger import setup_logger
from PIL import Image

# Load environment variables
load_dotenv()

# Setup logging
logger = setup_logger()

# Initialize services
s3_service = S3Service()
llm_service = LLMService()

def main():
 st.title("Document Analyzer")
 
 uploaded_file = st.file_uploader("Upload a document", type=['png', 'jpg', 'jpeg'])
 
 if uploaded_file:
 # Display image
 image = Image.open(uploaded_file)
 st.image(image, caption='Uploaded Document', use_column_width=True)
 
 if st.button('Analyze Document'):
 with st.spinner('Processing...'):
 try:
 # Analyze with LLM directly
 logger.info("Starting document analysis")
 analysis = llm_service.analyze_document(image)
 
 # Upload to S3 for storage
 logger.info(f"Uploading file: {uploaded_file.name}")
 s3_url = s3_service.upload_file(uploaded_file)
 
 # Display results
 st.success("Analysis Complete!")
 st.json(analysis)
 
 except Exception as e:
 logger.error(f"Error processing document: {str(e)}")
 st.error(f"Error: {str(e)}")

if __name__ == "__main__":
 main()

Streamlit handles the UI: file upload, display, button triggers.
LLMService and S3Service are orchestrated together to handle the AI query and file upload.
Real-time logs inform you of the status and highlight any issues.

Running VisualInsight Locally

Clone the Repository

git clone https://github.com/yotambraun/VisualInsight.git
cd VisualInsight

2. Environment Setup

Create a .env file at the project root:

AWS_ACCESS_KEY_ID=YOUR_AWS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET
AWS_REGION=us-east-1
S3_BUCKET_NAME=YOUR_BUCKET_NAME
GOOGLE_API_KEY=YOUR_GOOGLE_GENAI_KEY

3. Install Dependencies

pip install -r requirements.txt

4. Run the App

streamlit run app/main.py

Navigate to http://localhost:8501 in your browser to start using VisualInsight!

Containerization with Docker

Use Docker for consistent application performance across environments.

Figure 7: AWS ECS used for container orchestration

Dockerfile (excerpt):

FROM python:3.9-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application code
COPY app/ .

EXPOSE 8501

ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

Steps:

Build and Run Locally:

docker build -t visualinsight:latest .

Run:

docker run -p 8501:8501 visualinsight:latest

Visit http://localhost:8501 to use the app.

Infrastructure as Code with Terraform

Figure 8: AWS ECR, storing Docker images for the application

I use Terraform to create and manage AWS resources: S3, ECR, ECS, and more for deploying the application.

Why Terraform?

Terraform allows you to define your cloud infrastructure as code. Rather than manually creating AWS resources via the console or CLI, you simply write a configuration file. This ensures that your infrastructure is consistent, version-controlled, and easily replicable across multiple environments.

Key Advantages of Using Terraform:

Reproducibility: The same configurations can be deployed multiple times without drift.
Collaboration: Teams can review Terraform files in Git, allowing for better code reviews and fewer mistakes.
Scalability: Quick spin-up of additional resources if your usage grows.

Example Variables (infrastructure/terraform/variables.tf)

variable "aws_region" {
 description = "AWS region"
 type = string
 default = "us-east-1"
}

variable "bucket_name" {
 description = "Name of the S3 bucket"
 type = string
}

2. Main Configuration (infrastructure/terraform/main.tf)

terraform {
 required_providers {
 aws = {
 source = "hashicorp/aws"
 version = "~> 4.0"
 }
 }
}

provider "aws" {
 region = var.aws_region
}

resource "aws_s3_bucket" "documents" {
 bucket = var.bucket_name
}

resource "aws_ecr_repository" "app" {
 name = "document-analyzer"
}

resource "aws_ecs_cluster" "main" {
 name = "document-analyzer-cluster"
}
# ... ECS Service, Security Groups, Task Definition, etc.

Why ECR and ECS?

Amazon ECR (Elastic Container Registry): A private registry for storing your Docker images. Instead of relying on Docker Hub or other third parties, ECR keeps your images secure within your AWS account.
Amazon ECS (Elastic Container Service): An AWS-native container orchestration service. It manages the scaling and deployment of your containerized application automatically. With Fargate (serverless compute engine for containers), you don’t have to worry about provisioning or managing EC2 instances; it abstracts away all the heavy lifting.

In Short:

ECR stores your built Docker images.
ECS pulls those images from ECR and runs them as containers in a scalable manner.

3. Deploying via Terraform

cd infrastructure/terraform
terraform init
terraform plan -var="bucket_name=my-visualinsight-bucket"
terraform apply -var="bucket_name=my-visualinsight-bucket"

Terraform will:

Create an S3 bucket.
Create an ECR repository.
Set up an ECS cluster, tasks, services, IAM roles, and more.

Automated CI/CD with GitHub Actions

Automate the build, test, and deployment process to ensure consistent updates.

Your .github/workflows/deploy.yml takes care of:

AWS Login: Authenticates with your AWS account using secrets.
Docker Build & Push: Builds the Docker image and pushes it to Amazon ECR.
ECS Update: Forces a new deployment on ECS to pull the latest image.

Figure 9: GitHub Actions

Figure 10: GitHub Actions pipeline for CI/CD

Sample Deploy Workflow:

name: Deploy to AWS

on:
 push:
 branches: [ main ]

jobs:
 deploy:
 runs-on: ubuntu-latest
 
 steps:
 - uses: actions/checkout@v2

 - name: Configure AWS credentials
 uses: aws-actions/configure-aws-credentials@v1
 with:
 aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
 aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
 aws-region: us-east-1

 - name: Login to Amazon ECR
 id: login-ecr
 uses: aws-actions/amazon-ecr-login@v1

 - name: Build and push Docker image
 env:
 ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
 ECR_REPOSITORY: document-analyzer
 IMAGE_TAG: ${{ github.sha }}
 run: |
 docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
 docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG

 - name: Deploy to ECS
 run: |
 aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment

Whenever you push to main, GitHub Actions will build and deploy your latest changes automatically.

Real-World Impact

Time Efficiency
With AI-driven analysis, there’s no need for manual labeling or advanced ML pipeline setup.
Scalability
AWS S3 + ECS means you can handle ever-growing image datasets and traffic without re-architecting.
Reliability
Docker ensures consistent environments; Terraform standardizes infrastructure, and GitHub Actions automates testing and deployment.
User-Friendly
Streamlit’s intuitive UI means non-developers can upload images and see insights in real time.

Conclusion

VisualInsight takes the guesswork out of image analysis. By combining Streamlit, Google Generative AI (Gemini), AWS S3, Terraform and CI/CD, it delivers a robust, scalable solution that’s easy to use and maintain. VisualInsight streamlines the entire workflow — so you can focus on making discoveries, not wrestling with infrastructure.

Key Takeaways

Automation reduces manual work and simplifies processes.
Infrastructure as Code promotes collaboration and reproducibility.
Docker ensures consistency across development and production environments.
CI/CD enables fast and reliable updates.

Feel free to clone the GitHub Repository and customize it for your own project needs. If you enjoyed this, consider clapping on Medium, sharing with others, or following me for more deep dives into AI and cloud solutions!

Thanks for Reading!

If you enjoyed this post, please give it a clap. Feel free to follow me on Medium for more articles!

LinkedIn

References

Google Gemini: Google’s advanced AI model designed for multimodal data processing, including text, images, and audio.

Streamlit: An open-source app framework for creating and sharing data applications using Python.

AWS S3: Amazon Simple Storage Service (S3) is an object storage service offering scalability, data availability, security, and performance.

Docker: A platform for developing, shipping, and running applications inside containers, ensuring consistency across multiple development and release cycles.

Terraform: An open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.

GitHub Actions: A CI/CD platform that allows you to automate your build, test, and deployment pipeline.

AWS ECR (Elastic Container Registry): A fully managed container registry that makes it easy for developers to store, manage, and deploy Docker container images.

AWS ECS (Elastic Container Service): A highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS.

These references provide detailed information about each component used in the VisualInsight application.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication