
Building an Agentic RAG Pipeline: Gemini + ChromaDB + Kaggle → ResuMeme AI
Author(s): Anupama Garani
Originally published on Towards AI.

It started during a WiDS Career Catalyst volunteering session.
This wasn’t a client. This wasn’t a test case. This was someone trying to rewrite her story — and she asked me for help.
A young woman just beginning her journey in data science booked a time with me to discuss career strategy.
She had curiosity. She committed. She’d done the courses, the projects, the hard work.
She sent me her resume ahead of time.
And while it listed everything she’d done — the tools, the internships, the certifications —
Something was missing.
Not in her skills, but in the story.
There were no metrics. No voice. No way to capture who she really was or what made her stand out.
And that’s when it hit me:
It’s not that people aren’t qualified.
It’s that we’ve never been taught how to show it — especially on a PDF scanned by algorithms.
If I could build something that helped people find their voice, reframe their impact, and yes, even laugh through the process…
I would.
So I built ResuMeme.AI
A full-stack GenAI resume assistant, powered by Gemini and orchestrated entirely inside a Kaggle notebook.
It doesn’t just tell you what’s wrong — it shows you what’s possible.
Here’s what it does:
You Give:
- A resume feedback dictionary (parsed or LLM-generated)
You Get:
- ✅ A full resume evaluation (score, strengths, gaps)
- 💼 Real job matches based on RAG + embeddings
- 🧼 Cleaned, formatted, ATS-ready resume
- 📊 Simulated ATS layout score
- 📝 PDF output
- 🖼 A meme that roasts your job-hunting aura
Sample Meme Output
“Your resume says ‘team player.’ But your font choice says ‘menace.’”
— ResuMeme.AI
All outputs are grounded in real resume structure, actual job data, and prompt control — no hallucinated fluff.
It’s not a chatbot. It’s a career glow-up pipeline.
And it all runs inside a single notebook.
Let’s dive in.
I used a very badly formatted, outdated resume as the input for Kaggle.
It’s a 2-page long resume.
Input


The Workflow

This system uses:
- Gemini JSON Mode
- Zero shot prompting
- Chain-of-Thought + few-shot prompting
- LangGraph-style agentic routing
- SERP API to pull live job listings
- ChromaDB for embeddings
- Function calling to extract skills
- WeasyPrint for HTML-to-PDF export
- Gemini’s image API for memes
All of it lives inside one Kaggle notebook — no frontend needed.
Agents Involved:
- 🧠 ResumeCriticAgent → scores your resume like a hater
- 🧼 FormatterAgent → fixes formatting messes
- 👀 ATSVisionAgent → gives layout feedback
- 🧭 RAGMatcherAgent → finds real job matches
- 🎭 MemeGeneratorAgent → creates a custom meme
Real Outputs (What You See):
- Resume tone + score + gaps
- ATS layout feedback
- Top job matches + fit %
- What skills you’re missing
- An improved version of your resume
- A meme to keep you humble
🧠 Inside the Agent: ResumeCriticAgent
Purpose:
This agent performs three types of LLM-based reasoning to generate a comprehensive resume evaluation.

Steps:
- We first read the PDF using the PyPDF pacakges
# Import required libraries
import os
import PyPDF2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import re
from IPython.display import display, HTML, Image
# import google.generativeai as genai
print("Libraries imported successfully!")
def read_pdf_text(file_path):
num_pages=0
full_path = os.path.join("/kaggle/input", file_path)
# Let's store the contents in text string
text = ""
with open(full_path, 'rb') as file:
pdf_reader = PyPDF2.PdfReader(file)
num_pages=len(pdf_reader.pages)
for page in pdf_reader.pages:
text += page.extract_text()
if not text.strip():
print("Warning: No text extracted from PDF!")
else:
print(f"Successfully read PDF - extracted {len(text)} characters")
cleaned = text.replace('\n', ' ').replace(' ', ' ')
return cleaned.strip(),num_pages
# Call the function to read pdf
resume_text,pno = read_pdf_text("anu-resume/Anupama Garani Sheshagiri Resume.docx.pdf")
resume_text[:2000]
pno
After cleaning the extra spaces and carriage returns we then passt he input to zero shot prompting to high level understand the issues with the resume.
- Zero-Shot Prompting: Runs a no-context Gemini prompt for raw scoring
def zero_shot_prompt(resume_text_cleaned):
resume_prompt = f"""
You are ResumeCritic, a world-class GenAI career agent. Show all the errors with the resume.
Also fix each section
Resume:
\"\"\"
{resume_text_cleaned}
\"\"\"
"""
response = client.models.generate_content(
model="gemini-2.0-flash",
contents=resume_prompt)
return response.text
zero_shot_response = zero_shot_prompt(resume_text)
career_scorecard['zero_shot_feedback']=zero_shot_response
Markdown(zero_shot_response)
- Chain-of-Thought Prompting: Prompts Gemini to reason step-by-step before returning judgment. Chain of thought forces the model to think deeper by saying “Think step by step” and also providing few scenarios. In my case, I provided the following positive and negative examples
You are ResumeCritic, a GenAI resume evaluation agent trained to think step-by-step.
You will evaluate a candidate's resume, structured into sections:
- Summary
- Bullet Points
- Skills (based on target job title)
- Tools (based on target job title)
- Projects
- Job Type
- Target Title
---
🧪 For each section, think step by step and follow this process:
1. Carefully read the provided content
2. Apply the scoring rules and best practices
3. Compare against the positive and negative examples
4. Provide a final score (1-10) for the section
5. Give specific feedback - identify exact lines needing improvement and explain why
6. Suggest concrete updates for each issue identified
Be thorough in your analysis and detailed in your feedback.
---
🎯 Scoring Rules:
Based on the best practices for summary,
- **Summary**
- +2 if under 3 lines with 5+ job-aligned keywords
- -1 for buzzwords like "passionate", "dynamic"
- Best Practice: "Keep it factual, job-targeted, and metric-aligned"
- Keep it to 1 line summary
✅ GOOD EXAMPLES:
- "ML engineer with 5+ years experience building recommendation systems that increased revenue by 23% across SaaS platforms."
- "Data scientist specializing in NLP, predictive modeling, and A/B testing with proven 35% accuracy improvements for Fortune 500 clients."
- "Machine learning developer who reduced customer churn by 18% using clustering, Python, and AWS at scale for fintech applications."
- "Results-driven data engineer leveraging Spark, Airflow, and SQL to process 5TB daily data, enabling 40% faster decision-making."
- "Analytics specialist with expertise in TensorFlow, scikit-learn, and dashboard creation that increased marketing ROI by 27%."
❌ BAD EXAMPLES:
- "Passionate data scientist with a dynamic approach to solving complex business problems and a proven track record of success."
- "Hardworking and detail-oriented data professional seeking to leverage my skills and knowledge in a challenging role with opportunity for growth."
- "Innovative and creative thinker with extensive experience in data analysis and a strong background in mathematics and statistics, eager to make an impact."
- "Team player with excellent communication skills who is passionate about data science and machine learning, looking for new opportunities."
- "Detail-oriented data scientist with experience in Python, R, and data analysis with strong analytical and problem-solving skills."
evaluate {summary_text}
Output:
- Score (1–10)
- Tone analysis
- Strengths
- Gaps
- Suggestions
In order to store the output in a structured format, I chose the TypeDict and enforced the model to use the format
class SectionScore(typing.TypedDict):
score: float
feedback: List[str]
updates: List[str]
class ResumeEvaluation(typing.TypedDict):
summary: SectionScore
experience: SectionScore
skills: SectionScore
tools: SectionScore
projects: SectionScore
overall_score: float
seniority_match: Literal["Under-leveled", "Proper", "Over-leveled"]
Result:
All three evaluation outputs are merged and transformed into a structured JSON Result ScoreCard
used by downstream agents.
🧼 Inside the Agent: ResumeFormatterAgent
🧾 Purpose:
Clean up and reformat your resume to be ATS-compliant and easier for hiring managers to scan.

💡 Prompt Techniques Used:
- Few-shot prompting using examples of “bad” and “good” resume blocks
- HTML or markdown-style formatting for structured output
📤 Output:
- Cleaned resume (text or HTML)
- Removed fluff
- Rewritten headers
- Unified layout
🔥 Why it’s cool:
No drag-and-drop builder, no template dependency. This agent restructures your resume like a savvy AI resume consultant.
👀 Inside the Agent: ATSVisionAgent
🧠 Purpose:
Simulates layout scoring of your resume as if processed by a real ATS system.

💡 Prompt Techniques Used:
- Gemini prompt that scores based on layout (headings, spacing, section order)
- Future-ready for Gemini Vision API
ats_vision_response = image_client.models.generate_content(
model="gemini-2.0-flash-exp",
config=types.GenerateContentConfig(
temperature=0,
response_mime_type="application/json",
response_schema=ATSVisionLayoutFeedback
),
contents=[ats_vision_prompt,img])
📤 Output:
- ATS Layout Score (1–10)
- Specific layout issues (spacing, ordering, readability)
🔥 Why it’s cool:
No actual vision model needed — yet. But your system is future-proof and simulates how ATS views your visual layout.
🧭 Inside the Agent: RAGMatcherAgent
🧲 Purpose:
Find real jobs, analyze your fit, extract skill gaps, and generate custom job search filters.

💡 Tech & Prompt Techniques:
- Function calling: Extract job titles, companies, skills
- Google SERP API to get job listings
- SentenceTransformers for resume & job embeddings
- Cosine similarity scoring
- Gemini prompt to suggest improvements
I first used the embedding approach to match the data science jobs with the resume. Here is the heat map that was generated

The “Data scientist” has a correlation of 0.84 which is the highest. I use this to find the jobs using the SERP API
from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()
from serpapi import GoogleSearch
import os
Serp_api_key = user_secrets.get_secret("SERPAPI_KEY")
def search_real_job_links(role="Data Scientist", location="Austin, TX"):
params = {
"engine": "google",
"q": f'site:greenhouse.io OR site:lever.co "{role}" "{location}"',
"location": location,
"hl": "en",
"gl": "us",
"api_key": Serp_api_key
}
search = GoogleSearch(params)
results = search.get_dict()
jobs = []
for result in results.get("organic_results", []):
if "title" in result and "link" in result:
jobs.append({
"title": result["title"],
"link": result["link"],
"snippet": result.get("snippet", "")
})
return jobs
jobs = search_real_job_links(target_role,career_scorecard['resume_entity_mapped']['location'])
job_links = []
# print(jobs)
for job in jobs[:5]:
print("✅", job.get("title"), "-", job.get("company_name"))
print("Link:", job.get("link", "N/A"))
print("="*80)
job_links.append(job)
✅ Senior Data Scientist - Austin, Texas, United States - None
Link: https://boards.greenhouse.io/roku/jobs/6726915
================================================================================
✅ Job Application for Data Scientist at Base Power Company - None
Link: https://job-boards.greenhouse.io/basepowercompany/jobs/4551163008
================================================================================
✅ Jobs at IntegraFEC - None
Link: https://boards.greenhouse.io/integra
================================================================================
✅ Job Application for Data Scientist at YETI Test Events Job Board - None
Link: https://job-boards.greenhouse.io/yetitestevents/jobs/4001110004
================================================================================
✅ Jobs at IntegraFEC - Internships - None
Link: https://job-boards.greenhouse.io/integrainterns/jobs/4522535008
================================================================================
I then used Gemini with Beautiful soup for retrieving details about the jobs. Stored all this in ChromaDB for RAG
# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})
class GeminiEmbeddingFunction(EmbeddingFunction):
# Specify whether to generate embeddings for documents, or queries
document_mode = True
@retry.Retry(predicate=is_retriable)
def __call__(self, input: Documents) -> Embeddings:
if self.document_mode:
embedding_task = "retrieval_document"
else:
embedding_task = "retrieval_query"
response = client.models.embed_content(
model="models/text-embedding-004",
contents=input,
config=types.EmbedContentConfig(
task_type=embedding_task,
),
)
return [e.values for e in response.embeddings]
import chromadb
DB_NAME = "googlejobsdb"
embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True
chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)
documents=all_jobs
### Step 5: Store all the retrieved jobs in Chroma DBl_jobs
db.add(documents=documents, ids=[str(i) for i in range(len(documents))])
db.count
Implemented RAG on the documents and the resume with the following code
# Define the schema
class JobEntry(typing.TypedDict):
title: str
match_percentage: int
skills_present: list[str]
skills_missing: list[str]
match_analysis: str
improvement_suggestions: list[str]
class JobMatchResponse(typing.TypedDict):
jobs: list[JobEntry]
overall_recommendation: str
embed_fn.document_mode = False
# Search the Chroma DB using the specified query.
query = "Show me the closest job matches for my resume"
result = db.query(query_texts=[query], n_results=5)
[all_passages] = result["documents"]
all_passages
query_oneline = query.replace("\n", " ")
job_match_prompt = f"""
Given a question by job seeker : {query_oneline} and the resume {resume_text}
Generate a detailed analysis of how well this candidate's resume matches each Google job. For each job:
1. Calculate a match percentage (0-100%) based on alignment of skills, experience, location, and education
2. Identify skills from the job that are present in the resume
3. Identify important skills from the job that are missing in the resume
4. Explain why this role might be a good or poor fit
5. Suggest specific improvements to make the resume more competitive
Format your response as JSON:
{{
"jobs": [
{{
"title": "Job title",
"match_percentage": number,
"skills_present": ["skill1", "skill2"...],
"skills_missing": ["skill1", "skill2"...],
"match_analysis": "analysis of fit",
"improvement_suggestions": ["suggestion1", "suggestion2"...]
}},
...
],
"overall_recommendation": "overall career advice"
}}
"""
# Add the retrieved documents to the prompt.
for passage in all_passages:
passage_oneline = passage.replace("\n", " ")
job_match_prompt += f"PASSAGE: {passage_oneline}\n"
# Now call the model
answer = client.models.generate_content(
model="gemini-2.0-flash",
contents=job_match_prompt,
config=types.GenerateContentConfig(
response_mime_type="application/json",
response_schema=JobMatchResponse
)
)
Markdown(answer.text)
data = answer.text
print(data)

📤 Output:
- Top 5 job matches
- Resume-job fit %
- Strengths aligned
- Missing skills
- Suggested improvements
- Job search URL
🔥 Why it’s cool:
It’s a personalized job-hunting RAG agent that works without a proprietary job board. It adapts to your resume, not the other way around.
🎭 Inside the Agent: MemeGeneratorAgent
🖼 Purpose:
Generate a custom meme that reflects your resume’s tone, score, and career vibe.

💡 Prompt Techniques Used:
- Emotion-aware meme prompt with embedded tone
- Style tag (savage, wholesome, motivational)
- Optional retry logic for image generation
📤 Output:
- Meme image (.png/.jpeg)
- Meme caption (in bold markdown)
- Style tag (used in PDF report)
💡 View the AI-Generated Resume Scorecard
You can view the full structured HTML output generated by ResuMeme.AI here:



This scorecard includes:
– Section-wise evaluation of your resume
– Embedded metrics, formatting feedback, and keywords
– Generated by Gemini + custom evaluation logic
And. The. Meme


🔥 Why it’s cool:
It’s not just for laughs — the meme actually reflects your evaluation. The humor softens the critique and makes it memorable.
🔑 Resume best practices that work
These came from sitting with close resumes… but just not getting callbacks. Some were mine. Some were friends. Some were people like Tina who showed up ready and just needed the right nudge.:
1. Quantify Everything
📊 Don’t say “Worked on dashboards” —
✅ Say “Built 3 dashboards used weekly by execs, saving 10+ hours/month.”
2. Use the “X by Y that resulted in Z” Structure
🧱 Accomplished X by doing Y, which led to Z
✅ “Reduced churn by 15% by personalizing onboarding emails, resulting in $120K ARR boost.”
3. Keep Your Summary to 2–3 Lines
📏 No adjectives. No fluff. No life story.
✅ “Data Scientist with 3 years’ experience in fraud detection and cloud deployments. Skilled in Python, AWS, and LLM fine-tuning.”
4. Action Verbs Only
⚙️ Optimized, launched, deployed, scaled.
❌ Avoid: helped, participated, involved in
5. Core Competencies = Sentences, Not Multi-Column Lists
🛠️ Group your skills by how you use them.
✅ “Skilled in cloud orchestration using AWS Lambda and API Gateway for real-time data pipelines.”
6. Cut Weak Experience Sections
✂️ Tutor roles? Internships from 5 years ago? Remove or consolidate into one bullet.
7. Don’t Let One Role Eat Half the Page
🧹 Substitute teacher? 2 bullets max. Focus on transferable skills.
8. Projects Should Be 2 Lines Max
🔍 One line for the what, one for the impact.
✅ “Built RAG-based chatbot for housing search. Increased match relevance by 25% using Llama3 + FAISS vector store.”
9. Education Should Be One Line
🎓 Clean and tight:
University of X — B.S., CS, 2022
10. Aim for One Page
🧼 Unless you’re a CTO or PhD with patents, 1 page > 2 pages.
ResuMeme.AI will judge your whitespace choices. And so will recruiters.
Why I Built It
I was tired of boring AI projects.
And I was tired of watching people blindly apply to jobs with no idea if their resume even made sense.
This project gave me everything I wanted:
- Structure
- Motivation
- Vibes
- Feedback
- A laugh
Ready to Run It?
Let the AI judge you.
Fix your resume.
Find a job.
And laugh a little.
→ Launch the Notebook on Kaggle
Built in Kaggle.
Powered by Gemini.
Styled by your chaos.
This project was created as part of the Google GenAI Hackathon on Kaggle, where over 5,600 participants worldwide explored the cutting edge of agentic workflows, Gemini APIs, and RAG systems.
ResuMeme.AI emerged as a full-stack, multi-agent system that doesn’t just analyze resumes — it turns them into insight, action… and laughter. Built entirely within a Kaggle notebook, using the techniques taught across Days 1–4 of the bootcamp (from JSON mode to function calling and LangGraph), this system proves that practical AI can be both technically sharp and surprisingly human.
Author Bio
Anupama Garani is an AI Product Strategist and Builder obsessed with agentic workflows, retrieval, and building AI products that actually feel good to use.
She loves mixing serious AI pipelines with a little edge, a little chaos, and a lot of practical value.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI