Transform Image Data into Insights with VisualInsight’s AI Automation
Last Updated on January 7, 2025 by Editorial Team
Author(s): Yotam Braun
Originally published on Towards AI.
Extracting insights from images can often feel challenging. Whether you’re a researcher, an analyst, or simply curious, efficiently analyzing and understanding images is crucial but not always straightforward. This is where VisualInsight comes in.
Figure 1: https://developers.googleblog.com/en/gemini-15-flash-updates-google-ai-studio-gemini-api/
GitHub – yotambraun/VisualInsight
Contribute to yotambraun/VisualInsight development by creating an account on GitHub.
github.com
Challenges with Traditional Image Analysis Methods
- Manual Effort: Finding the right tools, writing custom scripts, and working with large datasets often involves significant manual work.
- Complexity: Navigating advanced algorithms, ML frameworks, or open-source projects can be overwhelming, especially for smaller teams.
- Storage and Security: Ensuring data is securely stored and easily retrievable adds another layer of complexity.
- Scaling: Handling larger datasets requires scalable infrastructure, which often involves high overhead.
VisualInsight addresses these challenges with a seamless and automated solution for image analysis.
Figure 2: Example of the user interface where you can upload images
As you can see, the UI helps to simplify the process. You just drag and drop your image — no complicated scripts required.
Introducing VisualInsight
Core Idea
VisualInsight is a Streamlit-based web application that simplifies image analysis using Google Generative AI (Gemini). It incorporates AWS S3 for secure storage of original images and results.
Figure 3: Analysis results displayed in the Streamlit application
By automating much of the heavy lifting, VisualInsight ensures you spend less time on configuration and more time on innovation.
Key Components
- Streamlit UI: A user-friendly interface for uploading, viewing, and analyzing images.
- LLM Service (Google Gemini): Advanced text-based insights derived from images.
- AWS S3 Storage: Secure storage for files and AI-generated analyses.
- Docker & Terraform: Infrastructure for quick deployments and reproducibility.
- CI/CD via GitHub Actions: Automated builds, tests, and deployments for reliability.
How VisualInsight Works
- Upload an Image
Drag and drop a JPG or PNG file onto the application. - AI Analysis with Google Gemini
The uploaded image is then passed to theLLMService
class, which uses Google’s Generative AI (Gemini) to generate descriptive insights about the image content.
Figure 4: Further analysis details being displayed to the user
3. Storage in AWS S3 Once analyzed, the application uploads both the original image and any analysis results to an S3 bucket for safe-keeping.
4. Display Results Insights are displayed in the application interface for immediate feedback.
Figure 5: Another view of the analysis interface
Code Highlights
Below are some of the core services that power VisualInsight.
- LLM Service (
app/services/llm_service.py
)
Handles the interaction with Google Gemini for image analysis.
import google.generativeai as genai
import os
from datetime import datetime
from PIL import Image
from utils.logger import setup_logger
logger = setup_logger()
class LLMService:
def __init__(self):
genai.configure(api_key=os.getenv('GOOGLE_API_KEY'))
self.model = genai.GenerativeModel('gemini-1.5-flash-002')
self.prompt = """
Analyze this Image and provide:
1. Image type
2. Key information
3. Important details
4. Notable observations
"""
def analyze_document(self, image: Image.Image) -> dict:
try:
logger.info("Sending request to LLM")
# Generate content directly with the PIL image
response = self.model.generate_content([
self.prompt,
image
])
return {
"analysis": response.text,
"timestamp": datetime.now().isoformat()
}
except Exception as e:
logger.error(f"LLM analysis failed: {str(e)}")
raise Exception(f"Failed to analyze document: {str(e)}")
What’s Happening Here?
- I configure our Google Generative AI (Gemini) with an API key.
- A default prompt outlines the kind of analysis we want.
- The
analyze_document
method sends the image to Gemini and returns its text-based analysis.
2. S3 Service (app/services/s3_service.py
)
Uploads files to AWS S3 with timestamped keys and generates presigned URLs for private access.
import boto3
import os
from datetime import datetime
from utils.logger import setup_logger
logger = setup_logger()
class S3Service:
def __init__(self):
self.s3_client = boto3.client(
's3',
aws_access_key_id=os.getenv('AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('AWS_SECRET_ACCESS_KEY'),
region_name=os.getenv('AWS_REGION', 'us-east-1')
)
self.bucket_name = os.getenv('S3_BUCKET_NAME')
def upload_file(self, file):
"""Upload file to S3 and return the URL"""
try:
# Generate unique filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
file_key = f"uploads/{timestamp}_{file.name}"
# Upload to S3
self.s3_client.upload_fileobj(
file,
self.bucket_name,
file_key
)
# Generate presigned URL that expires in 1 hour
url = self.s3_client.generate_presigned_url(
'get_object',
Params={
'Bucket': self.bucket_name,
'Key': file_key
},
ExpiresIn=3600
)
logger.info(f"File uploaded successfully: {url}")
return url
except Exception as e:
logger.error(f"S3 upload failed: {str(e)}")
raise Exception(f"Failed to upload file to S3: {str(e)}")
Figure 6: The AWS S3 bucket that stores uploaded images and analysis results
Core Features:
- Uses boto3 to interact with AWS S3.
- Generates a time-stamped key for each file.
- Creates a presigned URL for private file access without requiring you to open up the entire bucket.
3. The Streamlit Application (app/main.py
)
Provides the user interface for file uploads, analysis initiation, and displaying results.
import streamlit as st
import os
from dotenv import load_dotenv
from services.s3_service import S3Service
from services.llm_service import LLMService
from utils.logger import setup_logger
from PIL import Image
# Load environment variables
load_dotenv()
# Setup logging
logger = setup_logger()
# Initialize services
s3_service = S3Service()
llm_service = LLMService()
def main():
st.title("Document Analyzer")
uploaded_file = st.file_uploader("Upload a document", type=['png', 'jpg', 'jpeg'])
if uploaded_file:
# Display image
image = Image.open(uploaded_file)
st.image(image, caption='Uploaded Document', use_column_width=True)
if st.button('Analyze Document'):
with st.spinner('Processing...'):
try:
# Analyze with LLM directly
logger.info("Starting document analysis")
analysis = llm_service.analyze_document(image)
# Upload to S3 for storage
logger.info(f"Uploading file: {uploaded_file.name}")
s3_url = s3_service.upload_file(uploaded_file)
# Display results
st.success("Analysis Complete!")
st.json(analysis)
except Exception as e:
logger.error(f"Error processing document: {str(e)}")
st.error(f"Error: {str(e)}")
if __name__ == "__main__":
main()
- Streamlit handles the UI: file upload, display, button triggers.
- LLMService and S3Service are orchestrated together to handle the AI query and file upload.
- Real-time logs inform you of the status and highlight any issues.
Running VisualInsight Locally
- Clone the Repository
git clone https://github.com/yotambraun/VisualInsight.git
cd VisualInsight
2. Environment Setup
Create a .env
file at the project root:
AWS_ACCESS_KEY_ID=YOUR_AWS_KEY
AWS_SECRET_ACCESS_KEY=YOUR_AWS_SECRET
AWS_REGION=us-east-1
S3_BUCKET_NAME=YOUR_BUCKET_NAME
GOOGLE_API_KEY=YOUR_GOOGLE_GENAI_KEY
3. Install Dependencies
pip install -r requirements.txt
4. Run the App
streamlit run app/main.py
Navigate to http://localhost:8501
in your browser to start using VisualInsight!
Containerization with Docker
Use Docker for consistent application performance across environments.
Figure 7: AWS ECS used for container orchestration
Dockerfile (excerpt):
FROM python:3.9-slim
WORKDIR /app
# Install dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application code
COPY app/ .
EXPOSE 8501
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]
Steps:
- Build and Run Locally:
docker build -t visualinsight:latest .
- Run:
docker run -p 8501:8501 visualinsight:latest
Visit http://localhost:8501
to use the app.
Infrastructure as Code with Terraform
Figure 8: AWS ECR, storing Docker images for the application
I use Terraform to create and manage AWS resources: S3, ECR, ECS, and more for deploying the application.
Why Terraform?
Terraform allows you to define your cloud infrastructure as code. Rather than manually creating AWS resources via the console or CLI, you simply write a configuration file. This ensures that your infrastructure is consistent, version-controlled, and easily replicable across multiple environments.
Key Advantages of Using Terraform:
- Reproducibility: The same configurations can be deployed multiple times without drift.
- Collaboration: Teams can review Terraform files in Git, allowing for better code reviews and fewer mistakes.
- Scalability: Quick spin-up of additional resources if your usage grows.
- Example Variables (
infrastructure/terraform/variables.tf
)
variable "aws_region" {
description = "AWS region"
type = string
default = "us-east-1"
}
variable "bucket_name" {
description = "Name of the S3 bucket"
type = string
}
2. Main Configuration (infrastructure/terraform/main.tf
)
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 4.0"
}
}
}
provider "aws" {
region = var.aws_region
}
resource "aws_s3_bucket" "documents" {
bucket = var.bucket_name
}
resource "aws_ecr_repository" "app" {
name = "document-analyzer"
}
resource "aws_ecs_cluster" "main" {
name = "document-analyzer-cluster"
}
# ... ECS Service, Security Groups, Task Definition, etc.
Why ECR and ECS?
- Amazon ECR (Elastic Container Registry): A private registry for storing your Docker images. Instead of relying on Docker Hub or other third parties, ECR keeps your images secure within your AWS account.
- Amazon ECS (Elastic Container Service): An AWS-native container orchestration service. It manages the scaling and deployment of your containerized application automatically. With Fargate (serverless compute engine for containers), you don’t have to worry about provisioning or managing EC2 instances; it abstracts away all the heavy lifting.
In Short:
- ECR stores your built Docker images.
- ECS pulls those images from ECR and runs them as containers in a scalable manner.
3. Deploying via Terraform
cd infrastructure/terraform
terraform init
terraform plan -var="bucket_name=my-visualinsight-bucket"
terraform apply -var="bucket_name=my-visualinsight-bucket"
Terraform will:
- Create an S3 bucket.
- Create an ECR repository.
- Set up an ECS cluster, tasks, services, IAM roles, and more.
Automated CI/CD with GitHub Actions
Automate the build, test, and deployment process to ensure consistent updates.
Your .github/workflows/deploy.yml
takes care of:
- AWS Login: Authenticates with your AWS account using secrets.
- Docker Build & Push: Builds the Docker image and pushes it to Amazon ECR.
- ECS Update: Forces a new deployment on ECS to pull the latest image.
Figure 9: GitHub Actions
Figure 10: GitHub Actions pipeline for CI/CD
Sample Deploy Workflow:
name: Deploy to AWS
on:
push:
branches: [ main ]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v1
with:
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
aws-region: us-east-1
- name: Login to Amazon ECR
id: login-ecr
uses: aws-actions/amazon-ecr-login@v1
- name: Build and push Docker image
env:
ECR_REGISTRY: ${{ steps.login-ecr.outputs.registry }}
ECR_REPOSITORY: document-analyzer
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
- name: Deploy to ECS
run: |
aws ecs update-service --cluster document-analyzer-cluster --service document-analyzer --force-new-deployment
Whenever you push to main
, GitHub Actions will build and deploy your latest changes automatically.
Real-World Impact
- Time Efficiency
With AI-driven analysis, there’s no need for manual labeling or advanced ML pipeline setup. - Scalability
AWS S3 + ECS means you can handle ever-growing image datasets and traffic without re-architecting. - Reliability
Docker ensures consistent environments; Terraform standardizes infrastructure, and GitHub Actions automates testing and deployment. - User-Friendly
Streamlit’s intuitive UI means non-developers can upload images and see insights in real time.
Conclusion
VisualInsight takes the guesswork out of image analysis. By combining Streamlit, Google Generative AI (Gemini), AWS S3, Terraform and CI/CD, it delivers a robust, scalable solution that’s easy to use and maintain. VisualInsight streamlines the entire workflow — so you can focus on making discoveries, not wrestling with infrastructure.
Key Takeaways
- Automation reduces manual work and simplifies processes.
- Infrastructure as Code promotes collaboration and reproducibility.
- Docker ensures consistency across development and production environments.
- CI/CD enables fast and reliable updates.
Feel free to clone the GitHub Repository and customize it for your own project needs. If you enjoyed this, consider clapping on Medium, sharing with others, or following me for more deep dives into AI and cloud solutions!
Thanks for Reading!
If you enjoyed this post, please give it a clap. Feel free to follow me on Medium for more articles!
References
Google Gemini: Google’s advanced AI model designed for multimodal data processing, including text, images, and audio.
Streamlit: An open-source app framework for creating and sharing data applications using Python.
AWS S3: Amazon Simple Storage Service (S3) is an object storage service offering scalability, data availability, security, and performance.
Docker: A platform for developing, shipping, and running applications inside containers, ensuring consistency across multiple development and release cycles.
Terraform: An open-source infrastructure as code software tool that enables you to safely and predictably create, change, and improve infrastructure.
GitHub Actions: A CI/CD platform that allows you to automate your build, test, and deployment pipeline.
AWS ECR (Elastic Container Registry): A fully managed container registry that makes it easy for developers to store, manage, and deploy Docker container images.
AWS ECS (Elastic Container Service): A highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS.
These references provide detailed information about each component used in the VisualInsight application.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI