Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Last Updated on January 20, 2026 by Editorial Team

Author(s): Cocoindex

Originally published on Towards AI.

Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Transform unstructured meeting notes into a queryable knowledge graph with incremental updates — no full reprocessing required.

Meeting notes are goldmines of organizational intelligence. They capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents — searchable only through basic text search.

Imagine instead being able to query your meetings like a database:

• “Who attended meetings where the topic was ‘budget planning’?”
• “What tasks did Sarah get assigned across all meetings?”
• “Show me all decisions made in Q4 involving the engineering team.”

This is where knowledge graphs shine. By extracting structured information from unstructured meeting notes and building a graph representation, you unlock powerful relationship-based queries that would be impossible with traditional document storage.

In this article, we’ll build a practical CocoIndex pipeline that:

1. Reads Markdown meeting notes from Google Drive
2. Extracts structured entities (meetings, participants, tasks) using LLMs
3. Persists everything to Neo4j as a knowledge graph
4. Automatically updates only when source documents change

The full source code is available on GitHub.

Architecture Overview

The pipeline follows a clear data flow with incremental processing built in at every stage:

Google Drive (Documents - with change tracking)
 → Identify changed documents
 → Split into meetings
 → Extract structured data with LLM (only for changed documents)
 → Collect nodes and relationships
 → Export to Neo4j (with upsert logic)

Prerequisites

Install Neo4j and start it locally
Default local browser: http://localhost:7474
Default credentials used in this example: username neo4j, password cocoindex
Configure your OpenAI API key
Prepare Google Drive:
Create a Google Cloud service account and download its JSON credential
Share the source folders with the service account email
Collect the root folder IDs you want to ingest
See Setup for Google Drive for details

Environment

Set the following environment variables:

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2

Notes:

GOOGLE_DRIVE_ROOT_FOLDER_IDS accepts a comma-separated list of folder IDs
The flow polls recent changes and refreshes periodically

Let’s break down each component:

Flow Definition

Overview

Add source and collector

@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(
 flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
 """
 Define an example flow that extracts triples from files and builds knowledge graph.
 """
 credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
 root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")
 data_scope["documents"] = flow_builder.add_source(
 cocoindex.sources.GoogleDrive(
 service_account_credential_path=credential_path,
 root_folder_ids=root_folder_ids,
 recent_changes_poll_interval=datetime.timedelta(seconds=10),
 ),
 refresh_interval=datetime.timedelta(minutes=1),
 )

The pipeline starts by connecting to Google Drive using a service account. CocoIndex’s built-in source connector handles authentication and provides incremental change detection. The recent_changes_poll_interval parameter means the source checks for new or modified files every 10 seconds, while the refresh_interval determines when the entire flow re-runs (every minute).

This is one of CocoIndex’s superpowers: incremental processing with automatic change tracking. Instead of reprocessing all documents on every run, the framework:

Lists files from Google Drive with last modified time
Identifies only the files that have been added or modified since the last successful run
Skips unchanged files entirely
Passes only changed documents downstream

The result? In an enterprise with 1% daily churn, only 1% of documents trigger downstream processing. Unchanged files never hit your LLM API, never generate Neo4j queries, and never consume compute resources.

Add collector

meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()

The pipeline then collects data into specialized collectors for different entity types and relationships:

Meeting Nodes — Store the meeting itself with its date and notes
Attendance Relationships — Capture who attended meetings and whether they were the organizer
Task Decision Relationships — Link meetings to decisions (tasks that were decided upon)
Task Assignment Relationships — Assign specific tasks to people

Process each document

Extract meetings

with data_scope["documents"].row() as document:
 document["meetings"] = document["content"].transform(
 cocoindex.functions.SplitBySeparators(
 separators_regex=[r"\n\n##?\ "], keep_separator="RIGHT"
 )
 )

Meeting documents often contain multiple meetings in a single file. This step splits documents on Markdown headers (## or #) preceded by blank lines, treating each section as a separate meeting. The keep_separator="RIGHT" means the separator (header) is kept with the right segment, preserving context.

Extract meeting

Define Meeting schema

@dataclass
class Person:
 name: str
@dataclass
class Task:
 description: str
 assigned_to: list[Person]
@dataclass
class Meeting:
 time: datetime.date
 note: str
 organizer: Person
 participants: list[Person]
 tasks: list[Task]

This provides direct guidance for the LLM about what information to extract and their schema. This is far more reliable than asking an LLM to generate free-form output, from which we cannot get structured information to build a knowledge graph.

Extract and collect relationship

with document["meetings"].row() as meeting:
 parsed = meeting["parsed"] = meeting["text"].transform(
 cocoindex.functions.ExtractByLlm(
 llm_spec=cocoindex.LlmSpec(
 api_type=cocoindex.LlmApiType.OPENAI, model="gpt-5"
 ),
 output_type=Meeting,
 )
 )

Importantly, this step also benefits from incremental processing. Since ExtractByLlm is a heavy step, we keep the output in cache, and as long as inputs (input data text, model, output type definition) have no change, we reuse the cached output without re-running the LLM.

Collect relationship

meeting_key = {"note_file": document["filename"], "time": parsed["time"]}
meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
 id=cocoindex.GeneratedField.UUID,
 **meeting_key,
 person=parsed["organizer"]["name"],
 is_organizer=True,
)
with parsed["participants"].row() as participant:
 attended_rels.collect(
 id=cocoindex.GeneratedField.UUID,
 **meeting_key,
 person=participant["name"],
 )
with parsed["tasks"].row() as task:
 decided_tasks_rels.collect(
 id=cocoindex.GeneratedField.UUID,
 **meeting_key,
 description=task["description"],
 )
 with task["assigned_to"].row() as assigned_to:
 assigned_rels.collect(
 id=cocoindex.GeneratedField.UUID,
 **meeting_key,
 task=task["description"],
 person=assigned_to["name"],
 )

Collectors in CocoIndex act like in‑memory buffers: you declare collectors for different categories (meeting nodes, attendance, tasks, assignments), then as you process each document you “collect” relevant entries.

This block collects nodes and relationships from parsed meeting notes to build a knowledge graph in Neo4j using CocoIndex:

Person → Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended.
Meeting → Task (DECIDED) Links meetings to tasks or decisions that were made.
Person → Task (ASSIGNED_TO) Links tasks back to the people responsible for them.

Map to graph database

Overview

We will be creating a property graph with following nodes and relationships: To learn more about property graph, please refer to CocoIndex's Property Graph Targets documentation.

Map Meeting Nodes

meeting_nodes.export(
 "meeting_nodes",
 cocoindex.targets.Neo4j(
 connection=conn_spec, mapping=cocoindex.targets.Nodes(label="Meeting")
 ),
 primary_key_fields=["note_file", "time"],
)

Declare Person and Task Nodes

flow_builder.declare(
 cocoindex.targets.Neo4jDeclaration(
 connection=conn_spec,
 nodes_label="Person",
 primary_key_fields=["name"],
 )
)
flow_builder.declare(
 cocoindex.targets.Neo4jDeclaration(
 connection=conn_spec,
 nodes_label="Task",
 primary_key_fields=["description"],
 )
)

Map ATTENDED Relationship

ATTENDED relationships

attended_rels.export(
 "attended_rels",
 cocoindex.targets.Neo4j(
 connection=conn_spec,
 mapping=cocoindex.targets.Relationships(
 rel_type="ATTENDED",
 source=cocoindex.targets.NodeFromFields(
 label="Person",
 fields=[
 cocoindex.targets.TargetFieldMapping(
 source="person", target="name"
 )
 ],
 ),
 target=cocoindex.targets.NodeFromFields(
 label="Meeting",
 fields=[
 cocoindex.targets.TargetFieldMapping("note_file"),
 cocoindex.targets.TargetFieldMapping("time"),
 ],
 ),
 ),
 ),
 primary_key_fields=["id"],
)

This call ensures that ATTENDED relationships — i.e. “Person → Meeting” (organizer or participant → the meeting) — are explicitly encoded as edges in the Neo4j graph.
It links Person nodes with Meeting nodes via ATTENDED relationships, enabling queries like “which meetings did Alice attend?” or “who attended meeting X?”.
By mapping Person and Meeting nodes correctly and consistently (using unique keys), it ensures a clean graph with no duplicate persons or meetings.
Because relationships get unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-runs won’t duplicate edges or nodes.

Map DECIDED Relationship

DECIDED relationships

decided_tasks_rels.export(
 "decided_tasks_rels",
 cocoindex.targets.Neo4j(
 connection=conn_spec,
 mapping=cocoindex.targets.Relationships(
 rel_type="DECIDED",
 source=cocoindex.targets.NodeFromFields(
 label="Meeting",
 fields=[
 cocoindex.targets.TargetFieldMapping("note_file"),
 cocoindex.targets.TargetFieldMapping("time"),
 ],
 ),
 target=cocoindex.targets.NodeFromFields(
 label="Task",
 fields=[
 cocoindex.targets.TargetFieldMapping("description"),
 ],
 ),
 ),
 ),
 primary_key_fields=["id"],
)

This call ensures that DECIDED relationships — i.e., “Meeting → Task” — are explicitly encoded as edges in the Neo4j graph.
It links Meeting nodes with Task nodes via DECIDED relationships, enabling queries like:
“Which tasks were decided in Meeting X?”
“From which meeting did Task Y originate?”
By mapping Meeting and Task nodes consistently (using note_file + time for meetings and description for tasks), it prevents duplicate tasks or meeting nodes in the graph.
Because relationships have unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-running the pipeline won’t create duplicate edges or nodes.

Map ASSIGNED_TO Relationship

ASSIGNED_TO relationships

assigned_rels.export(
 "assigned_rels",
 cocoindex.targets.Neo4j(
 connection=conn_spec,
 mapping=cocoindex.targets.Relationships(
 rel_type="ASSIGNED_TO",
 source=cocoindex.targets.NodeFromFields(
 label="Person",
 fields=[
 cocoindex.targets.TargetFieldMapping(
 source="person", target="name"
 ),
 ],
 ),
 target=cocoindex.targets.NodeFromFields(
 label="Task",
 fields=[
 cocoindex.targets.TargetFieldMapping(
 source="task", target="description"
 ),
 ],
 ),
 ),
 ),
 primary_key_fields=["id"],
)

The Resulting Graph

After running this pipeline, your Neo4j database contains a rich, queryable graph:

Nodes:

Meeting – Represents individual meetings with properties like date and notes
Person – Represents individuals involved in meetings
Task – Represents actionable items decided in meetings

Relationships:

ATTENDED – Connects people to meetings they attended
DECIDED – Connects meetings to tasks that were decided
ASSIGNED_TO – Connects people to tasks they're responsible for

Importantly, in the final step to export to the knowledge graph, CocoIndex also does this incrementally. CocoIndex only mutates the knowledge graph for nodes or relationships that have changes, and it’s a no-op for unchanged stuff. This avoids unnecessary churning on the target database and minimizes the cost of target write operations.

Run

Build/update the graph

Install dependencies:

pip install -e .

Update the index (run the flow once to build/update the graph):

cocoindex update main

Browse the knowledge graph

Open Neo4j Browser at http://localhost:7474.

Sample Cypher queries:

// All relationships
MATCH p=()-->() RETURN p
// Who attended which meetings (including organizer)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t

Real-World Enterprise Applications

This pattern extends far beyond meeting notes:

Research Paper Analysis — Extract papers from organizational repositories, build knowledge graphs of concepts and citations across thousands of documents, and track updates to citations and concepts
Customer Support Tickets — Extract issues, solutions, and relationships between tickets and customers; identify patterns across thousands of tickets while handling frequent edits and status updates
Email Thread Summarization — Build graphs of communication patterns and decision outcomes across millions of emails; handle the reality that teams forward, edit, and reference previous discussions
Compliance Documentation — Extract regulatory requirements from policy documents; track changes to policies and cascade impacts through a graph structure; maintain audit trails of document versions
Competitive Intelligence — Extract data from public documents and news articles; build knowledge graphs of competitor relationships, products, and market positioning while handling constant updates

If this example was helpful, the easiest way to support CocoIndex is to give the project a ⭐ on GitHub.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Author(s): Cocoindex

Architecture Overview

Flow Definition

Overview

Add source and collector

Add collector

Process each document

Extract meetings

Extract meeting

Define Meeting schema

Extract and collect relationship

Collect relationship

Map to graph database

Overview

Map Meeting Nodes

Declare Person and Task Nodes

Map ATTENDED Relationship

Map DECIDED Relationship

Map ASSIGNED_TO Relationship

The Resulting Graph

Run

Real-World Enterprise Applications

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Recent Posts

Full-Stack Data Scientists for the Agentic Coding World

Building Production-Grade AI Skills with Snowflake Cortex AI Function Studio

I Tried 10 AI Agent Frameworks in 2026 — Here’s the Honest Guide I Wish I Had Earlier

How One Spring Boot Optimization Saved Our Startup $30,000 a Year

Inside Palantir AIP: How the World’s Most Controversial AI Platform Actually Works

What Is a Reverse Proxy? (And Why Every Backend Developer Should Care)

What Claude Opus 4.8 Actually Changes If You’re Building Agents

QWEN 3.7 Max Worked For 35 Hrs Straight And The Results Were Mind-blowing

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Author(s): Cocoindex

Architecture Overview

Flow Definition

Overview

Add source and collector

Add collector

Process each document

Extract meetings

Extract meeting

Define Meeting schema

Extract and collect relationship

Collect relationship

Map to graph database

Overview

Map Meeting Nodes

Declare Person and Task Nodes

Map ATTENDED Relationship

Map DECIDED Relationship

Map ASSIGNED_TO Relationship

The Resulting Graph

Run

Real-World Enterprise Applications

Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

Related posts

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement