Building an Agentic RAG Pipeline: Gemini + ChromaDB + Kaggle → ResuMeme AI

Author(s): Anupama Garani

Originally published on Towards AI.

Building an Agentic RAG Pipeline: Gemini + ChromaDB + Kaggle → ResuMeme AI — *Your resume says ‘team player.’ Your formatting says ‘I give up.’ — ResuMeme.AI”*

It started during a WiDS Career Catalyst volunteering session.

This wasn’t a client. This wasn’t a test case. This was someone trying to rewrite her story — and she asked me for help.

A young woman just beginning her journey in data science booked a time with me to discuss career strategy.

She had curiosity. She committed. She’d done the courses, the projects, the hard work.

She sent me her resume ahead of time.

And while it listed everything she’d done — the tools, the internships, the certifications —

Something was missing.

Not in her skills, but in the story.

There were no metrics. No voice. No way to capture who she really was or what made her stand out.

And that’s when it hit me:

It’s not that people aren’t qualified.

It’s that we’ve never been taught how to show it — especially on a PDF scanned by algorithms.

If I could build something that helped people find their voice, reframe their impact, and yes, even laugh through the process…
I would.

So I built ResuMeme.AI

A full-stack GenAI resume assistant, powered by Gemini and orchestrated entirely inside a Kaggle notebook.
It doesn’t just tell you what’s wrong — it shows you what’s possible.

Here’s what it does:

You Give:

A resume feedback dictionary (parsed or LLM-generated)

You Get:

✅ A full resume evaluation (score, strengths, gaps)
💼 Real job matches based on RAG + embeddings
🧼 Cleaned, formatted, ATS-ready resume
📊 Simulated ATS layout score
📝 PDF output
🖼 A meme that roasts your job-hunting aura

Sample Meme Output

“Your resume says ‘team player.’ But your font choice says ‘menace.’”
— ResuMeme.AI

All outputs are grounded in real resume structure, actual job data, and prompt control — no hallucinated fluff.

It’s not a chatbot. It’s a career glow-up pipeline.
And it all runs inside a single notebook.

Let’s dive in.

I used a very badly formatted, outdated resume as the input for Kaggle.

It’s a 2-page long resume.

Input

The Workflow

This system uses:

Gemini JSON Mode
Zero shot prompting
Chain-of-Thought + few-shot prompting
LangGraph-style agentic routing
SERP API to pull live job listings
ChromaDB for embeddings
Function calling to extract skills
WeasyPrint for HTML-to-PDF export
Gemini’s image API for memes

All of it lives inside one Kaggle notebook — no frontend needed.

Agents Involved:

🧠 ResumeCriticAgent → scores your resume like a hater
🧼 FormatterAgent → fixes formatting messes
👀 ATSVisionAgent → gives layout feedback
🧭 RAGMatcherAgent → finds real job matches
🎭 MemeGeneratorAgent → creates a custom meme

Real Outputs (What You See):

Resume tone + score + gaps
ATS layout feedback
Top job matches + fit %
What skills you’re missing
An improved version of your resume
A meme to keep you humble

🧠 Inside the Agent: ResumeCriticAgent

Purpose:
This agent performs three types of LLM-based reasoning to generate a comprehensive resume evaluation.

Steps:

We first read the PDF using the PyPDF pacakges

# Import required libraries
import os
import PyPDF2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import json
import re
from IPython.display import display, HTML, Image
# import google.generativeai as genai

print("Libraries imported successfully!")

def read_pdf_text(file_path):
 num_pages=0
 full_path = os.path.join("/kaggle/input", file_path)
 
 # Let's store the contents in text string
 text = ""
 with open(full_path, 'rb') as file:
 pdf_reader = PyPDF2.PdfReader(file)
 num_pages=len(pdf_reader.pages)
 for page in pdf_reader.pages:
 text += page.extract_text()
 
 if not text.strip():
 print("Warning: No text extracted from PDF!")
 else:
 print(f"Successfully read PDF - extracted {len(text)} characters")
 cleaned = text.replace('\n', ' ').replace(' ', ' ')
 return cleaned.strip(),num_pages

# Call the function to read pdf
resume_text,pno = read_pdf_text("anu-resume/Anupama Garani Sheshagiri Resume.docx.pdf")
resume_text[:2000]
pno

After cleaning the extra spaces and carriage returns we then passt he input to zero shot prompting to high level understand the issues with the resume.

Zero-Shot Prompting: Runs a no-context Gemini prompt for raw scoring

def zero_shot_prompt(resume_text_cleaned):
 resume_prompt = f"""
 You are ResumeCritic, a world-class GenAI career agent. Show all the errors with the resume.
 Also fix each section
 Resume:
 \"\"\"
 {resume_text_cleaned}
 \"\"\"
 """
 
 response = client.models.generate_content(
 model="gemini-2.0-flash",
 contents=resume_prompt)

 return response.text

zero_shot_response = zero_shot_prompt(resume_text)
career_scorecard['zero_shot_feedback']=zero_shot_response
Markdown(zero_shot_response)

Chain-of-Thought Prompting: Prompts Gemini to reason step-by-step before returning judgment. Chain of thought forces the model to think deeper by saying “Think step by step” and also providing few scenarios. In my case, I provided the following positive and negative examples

 You are ResumeCritic, a GenAI resume evaluation agent trained to think step-by-step.
 
 You will evaluate a candidate's resume, structured into sections:
 - Summary
 - Bullet Points
 - Skills (based on target job title)
 - Tools (based on target job title)
 - Projects
 - Job Type
 - Target Title
 ---
 
 🧪 For each section, think step by step and follow this process:
 1. Carefully read the provided content
 2. Apply the scoring rules and best practices 
 3. Compare against the positive and negative examples
 4. Provide a final score (1-10) for the section
 5. Give specific feedback - identify exact lines needing improvement and explain why
 6. Suggest concrete updates for each issue identified
 
 Be thorough in your analysis and detailed in your feedback.
 
 ---
 
 🎯 Scoring Rules:

 Based on the best practices for summary, 
 - **Summary**
 - +2 if under 3 lines with 5+ job-aligned keywords
 - -1 for buzzwords like "passionate", "dynamic"
 - Best Practice: "Keep it factual, job-targeted, and metric-aligned"
 - Keep it to 1 line summary 

 ✅ GOOD EXAMPLES:
 - "ML engineer with 5+ years experience building recommendation systems that increased revenue by 23% across SaaS platforms."
 - "Data scientist specializing in NLP, predictive modeling, and A/B testing with proven 35% accuracy improvements for Fortune 500 clients."
 - "Machine learning developer who reduced customer churn by 18% using clustering, Python, and AWS at scale for fintech applications."
 - "Results-driven data engineer leveraging Spark, Airflow, and SQL to process 5TB daily data, enabling 40% faster decision-making."
 - "Analytics specialist with expertise in TensorFlow, scikit-learn, and dashboard creation that increased marketing ROI by 27%."

 ❌ BAD EXAMPLES:
 - "Passionate data scientist with a dynamic approach to solving complex business problems and a proven track record of success."
 - "Hardworking and detail-oriented data professional seeking to leverage my skills and knowledge in a challenging role with opportunity for growth."
 - "Innovative and creative thinker with extensive experience in data analysis and a strong background in mathematics and statistics, eager to make an impact."
 - "Team player with excellent communication skills who is passionate about data science and machine learning, looking for new opportunities."
 - "Detail-oriented data scientist with experience in Python, R, and data analysis with strong analytical and problem-solving skills."
 
 evaluate {summary_text}

Output:

Score (1–10)
Tone analysis
Strengths
Gaps
Suggestions

In order to store the output in a structured format, I chose the TypeDict and enforced the model to use the format

class SectionScore(typing.TypedDict):
 score: float
 feedback: List[str]
 updates: List[str]

class ResumeEvaluation(typing.TypedDict):
 summary: SectionScore
 experience: SectionScore
 skills: SectionScore
 tools: SectionScore
 projects: SectionScore
 overall_score: float
 seniority_match: Literal["Under-leveled", "Proper", "Over-leveled"]

Result:
All three evaluation outputs are merged and transformed into a structured JSON Result ScoreCard used by downstream agents.

🧼 Inside the Agent: ResumeFormatterAgent

🧾 Purpose:
Clean up and reformat your resume to be ATS-compliant and easier for hiring managers to scan.

💡 Prompt Techniques Used:

Few-shot prompting using examples of “bad” and “good” resume blocks
HTML or markdown-style formatting for structured output

📤 Output:

Cleaned resume (text or HTML)
Removed fluff
Rewritten headers
Unified layout

🔥 Why it’s cool:
No drag-and-drop builder, no template dependency. This agent restructures your resume like a savvy AI resume consultant.

👀 Inside the Agent: ATSVisionAgent

🧠 Purpose:
Simulates layout scoring of your resume as if processed by a real ATS system.

💡 Prompt Techniques Used:

Gemini prompt that scores based on layout (headings, spacing, section order)
Future-ready for Gemini Vision API

ats_vision_response = image_client.models.generate_content(
 model="gemini-2.0-flash-exp", 
 config=types.GenerateContentConfig(
 temperature=0,
 response_mime_type="application/json",
 response_schema=ATSVisionLayoutFeedback
 ),
 contents=[ats_vision_prompt,img])

📤 Output:

ATS Layout Score (1–10)
Specific layout issues (spacing, ordering, readability)

🔥 Why it’s cool:
No actual vision model needed — yet. But your system is future-proof and simulates how ATS views your visual layout.

🧭 Inside the Agent: RAGMatcherAgent

🧲 Purpose:
Find real jobs, analyze your fit, extract skill gaps, and generate custom job search filters.

💡 Tech & Prompt Techniques:

Function calling: Extract job titles, companies, skills
Google SERP API to get job listings
SentenceTransformers for resume & job embeddings
Cosine similarity scoring
Gemini prompt to suggest improvements

I first used the embedding approach to match the data science jobs with the resume. Here is the heat map that was generated

The “Data scientist” has a correlation of 0.84 which is the highest. I use this to find the jobs using the SERP API

from kaggle_secrets import UserSecretsClient
user_secrets = UserSecretsClient()

from serpapi import GoogleSearch
import os
Serp_api_key = user_secrets.get_secret("SERPAPI_KEY")
def search_real_job_links(role="Data Scientist", location="Austin, TX"):
 params = {
 "engine": "google",
 "q": f'site:greenhouse.io OR site:lever.co "{role}" "{location}"',
 "location": location,
 "hl": "en",
 "gl": "us",
 "api_key": Serp_api_key
 }

 search = GoogleSearch(params)
 results = search.get_dict()

 jobs = []
 for result in results.get("organic_results", []):
 if "title" in result and "link" in result:
 jobs.append({
 "title": result["title"],
 "link": result["link"],
 "snippet": result.get("snippet", "")
 })

 return jobs

jobs = search_real_job_links(target_role,career_scorecard['resume_entity_mapped']['location'])
job_links = []
# print(jobs)
for job in jobs[:5]:
 print("✅", job.get("title"), "-", job.get("company_name"))
 print("Link:", job.get("link", "N/A"))
 print("="*80)
 job_links.append(job)

✅ Senior Data Scientist - Austin, Texas, United States - None
Link: https://boards.greenhouse.io/roku/jobs/6726915
================================================================================
✅ Job Application for Data Scientist at Base Power Company - None
Link: https://job-boards.greenhouse.io/basepowercompany/jobs/4551163008
================================================================================
✅ Jobs at IntegraFEC - None
Link: https://boards.greenhouse.io/integra
================================================================================
✅ Job Application for Data Scientist at YETI Test Events Job Board - None
Link: https://job-boards.greenhouse.io/yetitestevents/jobs/4001110004
================================================================================
✅ Jobs at IntegraFEC - Internships - None
Link: https://job-boards.greenhouse.io/integrainterns/jobs/4522535008
================================================================================

I then used Gemini with Beautiful soup for retrieving details about the jobs. Stored all this in ChromaDB for RAG

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

class GeminiEmbeddingFunction(EmbeddingFunction):
 # Specify whether to generate embeddings for documents, or queries
 document_mode = True

 @retry.Retry(predicate=is_retriable)
 def __call__(self, input: Documents) -> Embeddings:
 if self.document_mode:
 embedding_task = "retrieval_document"
 else:
 embedding_task = "retrieval_query"

 response = client.models.embed_content(
 model="models/text-embedding-004",
 contents=input,
 config=types.EmbedContentConfig(
 task_type=embedding_task,
 ),
 )
 return [e.values for e in response.embeddings]


import chromadb

DB_NAME = "googlejobsdb"

embed_fn = GeminiEmbeddingFunction()
embed_fn.document_mode = True

chroma_client = chromadb.Client()
db = chroma_client.get_or_create_collection(name=DB_NAME, embedding_function=embed_fn)
documents=all_jobs

### Step 5: Store all the retrieved jobs in Chroma DBl_jobs
db.add(documents=documents, ids=[str(i) for i in range(len(documents))])
db.count

Implemented RAG on the documents and the resume with the following code

# Define the schema
class JobEntry(typing.TypedDict):
 title: str
 match_percentage: int
 skills_present: list[str]
 skills_missing: list[str]
 match_analysis: str
 improvement_suggestions: list[str]

class JobMatchResponse(typing.TypedDict):
 jobs: list[JobEntry]
 overall_recommendation: str

embed_fn.document_mode = False

# Search the Chroma DB using the specified query.
query = "Show me the closest job matches for my resume"

result = db.query(query_texts=[query], n_results=5)
[all_passages] = result["documents"]

all_passages

query_oneline = query.replace("\n", " ")

job_match_prompt = f"""
 Given a question by job seeker : {query_oneline} and the resume {resume_text}
 Generate a detailed analysis of how well this candidate's resume matches each Google job. For each job:
 1. Calculate a match percentage (0-100%) based on alignment of skills, experience, location, and education
 2. Identify skills from the job that are present in the resume
 3. Identify important skills from the job that are missing in the resume
 4. Explain why this role might be a good or poor fit
 5. Suggest specific improvements to make the resume more competitive
 
 Format your response as JSON:
 {{
 "jobs": [
 {{
 "title": "Job title",
 "match_percentage": number,
 "skills_present": ["skill1", "skill2"...],
 "skills_missing": ["skill1", "skill2"...],
 "match_analysis": "analysis of fit",
 "improvement_suggestions": ["suggestion1", "suggestion2"...]
 }},
 ...
 ],
 "overall_recommendation": "overall career advice"
 }}
 """


# Add the retrieved documents to the prompt.
for passage in all_passages:
 passage_oneline = passage.replace("\n", " ")
 job_match_prompt += f"PASSAGE: {passage_oneline}\n"


# Now call the model
answer = client.models.generate_content(
 model="gemini-2.0-flash",
 contents=job_match_prompt,
 config=types.GenerateContentConfig(
 response_mime_type="application/json",
 response_schema=JobMatchResponse
 )
)


Markdown(answer.text)
data = answer.text
print(data)

Most similar matches based on embeddings

📤 Output:

Top 5 job matches
Resume-job fit %
Strengths aligned
Missing skills
Suggested improvements
Job search URL

🔥 Why it’s cool:
It’s a personalized job-hunting RAG agent that works without a proprietary job board. It adapts to your resume, not the other way around.

🎭 Inside the Agent: MemeGeneratorAgent

🖼 Purpose:
Generate a custom meme that reflects your resume’s tone, score, and career vibe.

💡 Prompt Techniques Used:

Emotion-aware meme prompt with embedded tone
Style tag (savage, wholesome, motivational)
Optional retry logic for image generation

📤 Output:

Meme image (.png/.jpeg)
Meme caption (in bold markdown)
Style tag (used in PDF report)

💡 View the AI-Generated Resume Scorecard

You can view the full structured HTML output generated by ResuMeme.AI here:

This scorecard includes:
– Section-wise evaluation of your resume
– Embedded metrics, formatting feedback, and keywords
– Generated by Gemini + custom evaluation logic

And. The. Meme

🔥 Why it’s cool:
It’s not just for laughs — the meme actually reflects your evaluation. The humor softens the critique and makes it memorable.

🔑 Resume best practices that work

These came from sitting with close resumes… but just not getting callbacks. Some were mine. Some were friends. Some were people like Tina who showed up ready and just needed the right nudge.:

1. Quantify Everything

📊 Don’t say “Worked on dashboards” —
✅ Say “Built 3 dashboards used weekly by execs, saving 10+ hours/month.”

2. Use the “X by Y that resulted in Z” Structure

🧱 Accomplished X by doing Y, which led to Z
✅ “Reduced churn by 15% by personalizing onboarding emails, resulting in $120K ARR boost.”

3. Keep Your Summary to 2–3 Lines

📏 No adjectives. No fluff. No life story.
✅ “Data Scientist with 3 years’ experience in fraud detection and cloud deployments. Skilled in Python, AWS, and LLM fine-tuning.”

4. Action Verbs Only

⚙️ Optimized, launched, deployed, scaled.
❌ Avoid: helped, participated, involved in

5. Core Competencies = Sentences, Not Multi-Column Lists

🛠️ Group your skills by how you use them.
✅ “Skilled in cloud orchestration using AWS Lambda and API Gateway for real-time data pipelines.”

6. Cut Weak Experience Sections

✂️ Tutor roles? Internships from 5 years ago? Remove or consolidate into one bullet.

7. Don’t Let One Role Eat Half the Page

🧹 Substitute teacher? 2 bullets max. Focus on transferable skills.

8. Projects Should Be 2 Lines Max

🔍 One line for the what, one for the impact.
✅ “Built RAG-based chatbot for housing search. Increased match relevance by 25% using Llama3 + FAISS vector store.”

9. Education Should Be One Line

🎓 Clean and tight:

University of X — B.S., CS, 2022

10. Aim for One Page

🧼 Unless you’re a CTO or PhD with patents, 1 page > 2 pages.
ResuMeme.AI will judge your whitespace choices. And so will recruiters.

Why I Built It

I was tired of boring AI projects.
And I was tired of watching people blindly apply to jobs with no idea if their resume even made sense.

This project gave me everything I wanted:

Structure
Motivation
Vibes
Feedback
A laugh

Ready to Run It?

Let the AI judge you.
Fix your resume.
Find a job.
And laugh a little.

→ Launch the Notebook on Kaggle

Built in Kaggle.
Powered by Gemini.
Styled by your chaos.

This project was created as part of the Google GenAI Hackathon on Kaggle, where over 5,600 participants worldwide explored the cutting edge of agentic workflows, Gemini APIs, and RAG systems.

ResuMeme.AI emerged as a full-stack, multi-agent system that doesn’t just analyze resumes — it turns them into insight, action… and laughter. Built entirely within a Kaggle notebook, using the techniques taught across Days 1–4 of the bootcamp (from JSON mode to function calling and LangGraph), this system proves that practical AI can be both technically sharp and surprisingly human.

Author Bio

Anupama Garani is an AI Product Strategist and Builder obsessed with agentic workflows, retrieval, and building AI products that actually feel good to use.
She loves mixing serious AI pipelines with a little edge, a little chaos, and a lot of practical value.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Building an Agentic RAG Pipeline: Gemini + ChromaDB + Kaggle → ResuMeme AI

Author(s): Anupama Garani

It’s not that people aren’t qualified.

Here’s what it does:

Input

The Workflow

🧠 Inside the Agent: ResumeCriticAgent

🧼 Inside the Agent: ResumeFormatterAgent

👀 Inside the Agent: ATSVisionAgent

🧭 Inside the Agent: RAGMatcherAgent

🎭 Inside the Agent: MemeGeneratorAgent

💡 View the AI-Generated Resume Scorecard

And. The. Meme

🔑 Resume best practices that work

Why I Built It

Ready to Run It?

Author Bio

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Building an Agentic RAG Pipeline: Gemini + ChromaDB + Kaggle → ResuMeme AI

Author(s): Anupama Garani

It’s not that people aren’t qualified.

Here’s what it does:

Input

The Workflow

🧠 Inside the Agent: ResumeCriticAgent

🧼 Inside the Agent: ResumeFormatterAgent

👀 Inside the Agent: ATSVisionAgent

🧭 Inside the Agent: RAGMatcherAgent

🎭 Inside the Agent: MemeGeneratorAgent

💡 View the AI-Generated Resume Scorecard

And. The. Meme

🔑 Resume best practices that work

Why I Built It

Ready to Run It?

Author Bio

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement