Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j
Last Updated on January 20, 2026 by Editorial Team
Author(s): Cocoindex
Originally published on Towards AI.

Transform unstructured meeting notes into a queryable knowledge graph with incremental updates — no full reprocessing required.
Meeting notes are goldmines of organizational intelligence. They capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents — searchable only through basic text search.
Imagine instead being able to query your meetings like a database:
• “Who attended meetings where the topic was ‘budget planning’?”
• “What tasks did Sarah get assigned across all meetings?”
• “Show me all decisions made in Q4 involving the engineering team.”
This is where knowledge graphs shine. By extracting structured information from unstructured meeting notes and building a graph representation, you unlock powerful relationship-based queries that would be impossible with traditional document storage.
In this article, we’ll build a practical CocoIndex pipeline that:
1. Reads Markdown meeting notes from Google Drive
2. Extracts structured entities (meetings, participants, tasks) using LLMs
3. Persists everything to Neo4j as a knowledge graph
4. Automatically updates only when source documents change
The full source code is available on GitHub.

Architecture Overview
The pipeline follows a clear data flow with incremental processing built in at every stage:
Google Drive (Documents - with change tracking)
→ Identify changed documents
→ Split into meetings
→ Extract structured data with LLM (only for changed documents)
→ Collect nodes and relationships
→ Export to Neo4j (with upsert logic)
Prerequisites
- Install Neo4j and start it locally
Default local browser: http://localhost:7474
Default credentials used in this example: usernameneo4j, passwordcocoindex - Configure your OpenAI API key
- Prepare Google Drive:
Create a Google Cloud service account and download its JSON credential
Share the source folders with the service account email
Collect the root folder IDs you want to ingest - See Setup for Google Drive for details
Environment
Set the following environment variables:
export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2
Notes:
GOOGLE_DRIVE_ROOT_FOLDER_IDSaccepts a comma-separated list of folder IDs- The flow polls recent changes and refreshes periodically
Let’s break down each component:
Flow Definition
Overview

Add source and collector
@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
"""
Define an example flow that extracts triples from files and builds knowledge graph.
"""
credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.GoogleDrive(
service_account_credential_path=credential_path,
root_folder_ids=root_folder_ids,
recent_changes_poll_interval=datetime.timedelta(seconds=10),
),
refresh_interval=datetime.timedelta(minutes=1),
)
The pipeline starts by connecting to Google Drive using a service account. CocoIndex’s built-in source connector handles authentication and provides incremental change detection. The recent_changes_poll_interval parameter means the source checks for new or modified files every 10 seconds, while the refresh_interval determines when the entire flow re-runs (every minute).

This is one of CocoIndex’s superpowers: incremental processing with automatic change tracking. Instead of reprocessing all documents on every run, the framework:
- Lists files from Google Drive with last modified time
- Identifies only the files that have been added or modified since the last successful run
- Skips unchanged files entirely
- Passes only changed documents downstream
The result? In an enterprise with 1% daily churn, only 1% of documents trigger downstream processing. Unchanged files never hit your LLM API, never generate Neo4j queries, and never consume compute resources.
Add collector
meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()
The pipeline then collects data into specialized collectors for different entity types and relationships:
- Meeting Nodes — Store the meeting itself with its date and notes
- Attendance Relationships — Capture who attended meetings and whether they were the organizer
- Task Decision Relationships — Link meetings to decisions (tasks that were decided upon)
- Task Assignment Relationships — Assign specific tasks to people
Process each document
Extract meetings
with data_scope["documents"].row() as document:
document["meetings"] = document["content"].transform(
cocoindex.functions.SplitBySeparators(
separators_regex=[r"\n\n##?\ "], keep_separator="RIGHT"
)
)
Meeting documents often contain multiple meetings in a single file. This step splits documents on Markdown headers (## or #) preceded by blank lines, treating each section as a separate meeting. The keep_separator="RIGHT" means the separator (header) is kept with the right segment, preserving context.

Extract meeting
Define Meeting schema
@dataclass
class Person:
name: str
@dataclass
class Task:
description: str
assigned_to: list[Person]
@dataclass
class Meeting:
time: datetime.date
note: str
organizer: Person
participants: list[Person]
tasks: list[Task]
This provides direct guidance for the LLM about what information to extract and their schema. This is far more reliable than asking an LLM to generate free-form output, from which we cannot get structured information to build a knowledge graph.
Extract and collect relationship
with document["meetings"].row() as meeting:
parsed = meeting["parsed"] = meeting["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI, model="gpt-5"
),
output_type=Meeting,
)
)
Importantly, this step also benefits from incremental processing. Since ExtractByLlm is a heavy step, we keep the output in cache, and as long as inputs (input data text, model, output type definition) have no change, we reuse the cached output without re-running the LLM.

Collect relationship
meeting_key = {"note_file": document["filename"], "time": parsed["time"]}
meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
person=parsed["organizer"]["name"],
is_organizer=True,
)
with parsed["participants"].row() as participant:
attended_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
person=participant["name"],
)
with parsed["tasks"].row() as task:
decided_tasks_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
description=task["description"],
)
with task["assigned_to"].row() as assigned_to:
assigned_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
task=task["description"],
person=assigned_to["name"],
)
Collectors in CocoIndex act like in‑memory buffers: you declare collectors for different categories (meeting nodes, attendance, tasks, assignments), then as you process each document you “collect” relevant entries.
This block collects nodes and relationships from parsed meeting notes to build a knowledge graph in Neo4j using CocoIndex:
- Person → Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended.
- Meeting → Task (DECIDED) Links meetings to tasks or decisions that were made.
- Person → Task (ASSIGNED_TO) Links tasks back to the people responsible for them.
Map to graph database
Overview
We will be creating a property graph with following nodes and relationships: To learn more about property graph, please refer to CocoIndex's Property Graph Targets documentation.
Map Meeting Nodes

meeting_nodes.export(
"meeting_nodes",
cocoindex.targets.Neo4j(
connection=conn_spec, mapping=cocoindex.targets.Nodes(label="Meeting")
),
primary_key_fields=["note_file", "time"],
)
Declare Person and Task Nodes
flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Person",
primary_key_fields=["name"],
)
)
flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Task",
primary_key_fields=["description"],
)
)
Map ATTENDED Relationship
ATTENDED relationships
attended_rels.export(
"attended_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="ATTENDED",
source=cocoindex.targets.NodeFromFields(
label="Person",
fields=[
cocoindex.targets.TargetFieldMapping(
source="person", target="name"
)
],
),
target=cocoindex.targets.NodeFromFields(
label="Meeting",
fields=[
cocoindex.targets.TargetFieldMapping("note_file"),
cocoindex.targets.TargetFieldMapping("time"),
],
),
),
),
primary_key_fields=["id"],
)
- This call ensures that ATTENDED relationships — i.e. “Person → Meeting” (organizer or participant → the meeting) — are explicitly encoded as edges in the Neo4j graph.
- It links
Personnodes withMeetingnodes viaATTENDEDrelationships, enabling queries like “which meetings did Alice attend?” or “who attended meeting X?”. - By mapping
PersonandMeetingnodes correctly and consistently (using unique keys), it ensures a clean graph with no duplicate persons or meetings. - Because relationships get unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-runs won’t duplicate edges or nodes.
Map DECIDED Relationship
DECIDED relationships
decided_tasks_rels.export(
"decided_tasks_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="DECIDED",
source=cocoindex.targets.NodeFromFields(
label="Meeting",
fields=[
cocoindex.targets.TargetFieldMapping("note_file"),
cocoindex.targets.TargetFieldMapping("time"),
],
),
target=cocoindex.targets.NodeFromFields(
label="Task",
fields=[
cocoindex.targets.TargetFieldMapping("description"),
],
),
),
),
primary_key_fields=["id"],
)
- This call ensures that DECIDED relationships — i.e., “Meeting → Task” — are explicitly encoded as edges in the Neo4j graph.
- It links
Meetingnodes withTasknodes viaDECIDEDrelationships, enabling queries like: - “Which tasks were decided in Meeting X?”
- “From which meeting did Task Y originate?”
- By mapping
MeetingandTasknodes consistently (usingnote_file + timefor meetings anddescriptionfor tasks), it prevents duplicate tasks or meeting nodes in the graph. - Because relationships have unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-running the pipeline won’t create duplicate edges or nodes.
Map ASSIGNED_TO Relationship
ASSIGNED_TO relationships
assigned_rels.export(
"assigned_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="ASSIGNED_TO",
source=cocoindex.targets.NodeFromFields(
label="Person",
fields=[
cocoindex.targets.TargetFieldMapping(
source="person", target="name"
),
],
),
target=cocoindex.targets.NodeFromFields(
label="Task",
fields=[
cocoindex.targets.TargetFieldMapping(
source="task", target="description"
),
],
),
),
),
primary_key_fields=["id"],
)
The Resulting Graph
After running this pipeline, your Neo4j database contains a rich, queryable graph:
Nodes:
Meeting– Represents individual meetings with properties like date and notesPerson– Represents individuals involved in meetingsTask– Represents actionable items decided in meetings
Relationships:
ATTENDED– Connects people to meetings they attendedDECIDED– Connects meetings to tasks that were decidedASSIGNED_TO– Connects people to tasks they're responsible for
Importantly, in the final step to export to the knowledge graph, CocoIndex also does this incrementally. CocoIndex only mutates the knowledge graph for nodes or relationships that have changes, and it’s a no-op for unchanged stuff. This avoids unnecessary churning on the target database and minimizes the cost of target write operations.
Run
Build/update the graph
Install dependencies:
pip install -e .
Update the index (run the flow once to build/update the graph):
cocoindex update main
Browse the knowledge graph
Open Neo4j Browser at http://localhost:7474.
Sample Cypher queries:
// All relationships
MATCH p=()-->() RETURN p
// Who attended which meetings (including organizer)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t

Real-World Enterprise Applications
This pattern extends far beyond meeting notes:
- Research Paper Analysis — Extract papers from organizational repositories, build knowledge graphs of concepts and citations across thousands of documents, and track updates to citations and concepts
- Customer Support Tickets — Extract issues, solutions, and relationships between tickets and customers; identify patterns across thousands of tickets while handling frequent edits and status updates
- Email Thread Summarization — Build graphs of communication patterns and decision outcomes across millions of emails; handle the reality that teams forward, edit, and reference previous discussions
- Compliance Documentation — Extract regulatory requirements from policy documents; track changes to policies and cascade impacts through a graph structure; maintain audit trails of document versions
- Competitive Intelligence — Extract data from public documents and news articles; build knowledge graphs of competitor relationships, products, and market positioning while handling constant updates
If this example was helpful, the easiest way to support CocoIndex is to give the project a ⭐ on GitHub.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Towards AI Academy
We Build Enterprise-Grade AI. We'll Teach You to Master It Too.
15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.
Start free — no commitment:
→ 6-Day Agentic AI Engineering Email Guide — one practical lesson per day
→ Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages
Our courses:
→ AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.
→ Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.
→ AI for Work — Understand, evaluate, and apply AI for complex work tasks.
Note: Article content contains the views of the contributing authors and not Towards AI.