Organise Photo Dumps With AI: Face Recognition & Reverse Image Search
Last Updated on January 3, 2025 by Editorial Team
Author(s): Tapan Babbar
Originally published on Towards AI.
Have you ever been handed a party photo dump so massive that scrolling through it feels like running an endless marathon of blurry dance moves, awkward smiles, and random shoes? It leaves you wondering, Am I even in these pictures? Luckily, face recognition technology is here to rescue you from the endless swipe-fest. With just a touch of AI magic, you can effortlessly find yourself (and your equally fabulous friends) in a sea of chaotic snapshots. Letβs dive into how you can create a face recognition system that doesnβt just find faces but also organizes them faster than you can say βphoto bombβ!
The Concept: Automated Face Recognition for Party Photos
This project uses advanced computer vision techniques to detect and recognize faces in a collection of photos automatically. The goal is to streamline finding and organizing pictures by person, eliminating the need for manual sorting.
Concept Breakdown:
At its core, this project revolves around three main steps:
- Detecting Faces: Identify individual faces within images using a specialized face detection model.
- Generating Embeddings: Convert each detected face into a unique numerical representation (embedding) using a deep learning model.
- Indexing and Searching: Use FAISS (Facebook AI Similarity Search) to quickly find similar faces based on their embeddings.
For this experiment, we are using a dataset of photos from the Academy Awards 2024. The images in this dataset capture various scenes from the event, including red-carpet moments, backstage candid shots, and the ceremony itself. Each photo contains multiple faces, including celebrities, guests, and attendees. The dataset is diverse in terms of lighting, angles, and facial expressions, which makes it an ideal challenge for face detection and recognition models. By leveraging this dataset, we can explore the effectiveness of automated face recognition techniques in real-world, high-stakes environments like the Oscars.
Letβs dive deeper into each of these components:
Step 1: Face Detection and Cropping
Face detection is the process of locating human faces within an image. For this task, we use the YuNet model, which is a state-of-the-art face detection algorithm. YuNet is fast and accurate, capable of detecting faces with high confidence even in challenging environments (like varied lighting or complex backgrounds).
import cv2
# Load YuNet model for face detection
yunet = cv2.FaceDetectorYN.create(
model="face_detection_yunet_2023mar.onnx", # Pre-trained ONNX model path
config="",
input_size=(320, 320), # Input image size
score_threshold=0.9,
nms_threshold=0.3,
top_k=5000
)
def detect_and_crop_faces(image_path, output_folder=None, return_boxes=False):
# Read the input image
img = cv2.imread(image_path)
if img is None:
print(f"Could not read {image_path}")
return [] if not return_boxes else ([], [])
# Set YuNet input size
height, width = img.shape[:2]
yunet.setInputSize((width, height))
# Detect faces
_, faces = yunet.detect(img)
cropped_faces = []
face_boxes = []
if faces is not None:
for idx, face in enumerate(faces):
x, y, w, h = face[:4].astype(int)
# Crop the face from the image
cropped_face = img[y:y+h, x:x+w]
cropped_faces.append(cropped_face)
face_boxes.append((x, y, w, h)) # Store the bounding box coordinates
if return_boxes:
return cropped_faces, face_boxes
return cropped_faces
This function detects faces, crops them, and optionally saves them into a specified folder. Itβs perfect for breaking down crowded photos into identifiable chunks.
Step 2: Generating Embeddings
Once faces are detected, the next step is to convert each face into a numerical representation known as an embedding. These embeddings serve as unique βfingerprintsβ of the face, capturing the essence of the face in a fixed-size vector format.
For generating these embeddings, we use a model like FaceNet or VGG-Face. These models are pre-trained to generate highly informative embeddings that can be used to compare and find similarities between faces.
from deepface import DeepFace
import numpy as np
def get_embeddings(face_images):
embeddings = []
for face_img in face_images:
face_rgb = cv2.cvtColor(face_img, cv2.COLOR_BGR2RGB)
embedding = DeepFace.represent(
img_path=face_rgb,
model_name="VGG-Face",
enforce_detection=False
)[0]['embedding']
embeddings.append(np.array(embedding, dtype=np.float32))
return embeddings
Using these embeddings, we can mathematically compare how similar one face is to another.
Step 3: Indexing with FAISS
With embeddings ready, the next step is to use a search system to quickly find similar faces. FAISS (Facebook AI Similarity Search) is an efficient library for handling large-scale similarity search tasks. It can index these embeddings and allow for fast lookups to find similar faces based on cosine similarity.
import faiss
def process_and_store_images(image_folder, face_folder, index, faiss_index_path):
os.makedirs(face_folder, exist_ok=True)
for image_file in os.listdir(image_folder):
image_path = os.path.join(image_folder, image_file)
faces = detect_and_crop_faces(image_path, face_folder)
if faces:
embeddings = get_embeddings(faces)
for embedding in embeddings:
index.add(np.expand_dims(embedding, axis=0))
faiss.write_index(index, faiss_index_path)
print(f"FAISS index saved to {faiss_index_path}")
This function processes all images in a folder, detects faces, extracts embeddings, and saves them in a FAISS index.
Sample Query With Results
Once the FAISS index is built, you can query it to find similar faces in seconds. This is perfect for identifying yourself in a batch of party photos.
def search_similar_faces(query_image, faiss_index_path, top_k=5):
index = faiss.read_index(faiss_index_path)
faces = detect_and_crop_faces(query_image)
if not faces:
print("No face detected in query image.")
return
query_embeddings = get_embeddings(faces)
for query_embedding in query_embeddings:
distances, indices = index.search(np.expand_dims(query_embedding, axis=0), top_k)
print("Top Matches:")
for dist, idx in zip(distances[0], indices[0]):
print(f"Index: {idx}, Distance: {dist}")
This allows you to upload a single image and find its closest matches in the dataset. Imagine locating a candid photo of yourself laughing in the background of someone elseβs selfie!
In this section, we explore the outcomes of applying face recognition and clustering techniques to images of Bradley Cooper from the Academy Awards 2024 dataset. The model performed impressively, accurately identifying multiple images of the actor through the face detection and embedding pipeline.
The embeddings generated for Bradley Cooperβs photos were remarkably consistent, enabling precise clustering of his images within the dataset. Hereβs a breakdown of the results:
Input:
# Query image
query_image = "oscar/cropped_faces/9_face_2.jpg"
# Search for similar faces
search_similar_faces(query_image, faiss_index_path, metadata_path, top_k=5
Top Matches:
Original Image: 9.jpg, Cropped Face: 9_face_2.jpg, Distance: 0.022086620330810547
Original Image: 2.jpg, Cropped Face: 2_face_0.jpg, Distance: 0.9272879958152771
Original Image: 3.jpg, Cropped Face: 3_face_0.jpg, Distance: 0.9644865393638611
Original Image: 1.jpg, Cropped Face: 1_face_0.jpg, Distance: 0.9995566606521606
Original Image: 10.jpg, Cropped Face: 10_face_2.jpg, Distance: 1.19015371799469
An example image of Bradley Cooper was used as the query for the face recognition pipeline.
Output (Similar Faces in the Dataset):
A collection of face thumbnails grouped based on similar facial embeddings, showcasing different moments of Bradley Cooper captured in the dataset.
Output (Original Images Matching the Query):
The exact matches of the input image within the dataset, verifying the modelβs ability to locate specific instances with high accuracy.
These results highlight the efficiency of the face recognition system in identifying and organizing images of a specific individual, even in a large and diverse dataset.
Gender Bias on Model
One interesting observation during the face clustering process was a potential gender bias in the model. While the model was effective at clustering images of men, it seemed to group most women in a single cluster, even when they appeared in different settings or lighting conditions. For instance, the model struggled to differentiate between images of Ellen DeGeneres and Angelina Jolie, often grouping them despite clear differences. Although the model could accurately identify each individualβs face, it struggled to differentiate women as reliably as men.
This observation could be an indication of some form of gender bias in the model. One possible reason could be that the model was trained on a dataset with an imbalance between male and female faces, which could have led the model to focus more on distinguishing features common in male faces. Conversely, the model might rely on more subtle or shared features in women, causing it to group them.
However, it is important to note that this is just a personal observation, and I plan to conduct further tests to determine if this trend holds across a wider range of images and individuals.
Conclusion
Gone are the days of manually sifting through photos to find yourself. With this face recognition pipeline, you can organize and search party photos like a pro. So next time youβre at an event, let AI handle the chaos, and spend more time enjoying the memories instead of managing them!
The full source code and the Jupyter Notebook are available in the GitHub repository. Feel free to reach out if you have ideas for improvements or any other observations.
References:
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI