Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Free: 6-day Agentic AI Engineering Email Guide.
Learnings from Towards AI's hands-on work with real clients.
Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j
Data Science   Latest   Machine Learning

Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Last Updated on January 20, 2026 by Editorial Team

Author(s): Cocoindex

Originally published on Towards AI.

Building a Self-Updating Knowledge Graph From Meeting Notes With LLM Extraction and Neo4j

Transform unstructured meeting notes into a queryable knowledge graph with incremental updates — no full reprocessing required.

Meeting notes are goldmines of organizational intelligence. They capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documents — searchable only through basic text search.

Imagine instead being able to query your meetings like a database:

• “Who attended meetings where the topic was ‘budget planning’?”
• “What tasks did Sarah get assigned across all meetings?”
• “Show me all decisions made in Q4 involving the engineering team.”

This is where knowledge graphs shine. By extracting structured information from unstructured meeting notes and building a graph representation, you unlock powerful relationship-based queries that would be impossible with traditional document storage.

In this article, we’ll build a practical CocoIndex pipeline that:

1. Reads Markdown meeting notes from Google Drive
2. Extracts structured entities (meetings, participants, tasks) using LLMs
3. Persists everything to Neo4j as a knowledge graph
4. Automatically updates only when source documents change

The full source code is available on GitHub.

Architecture Overview

The pipeline follows a clear data flow with incremental processing built in at every stage:

Google Drive (Documents - with change tracking)
→ Identify changed documents
→ Split into meetings
→ Extract structured data with LLM (only for changed documents)
→ Collect nodes and relationships
→ Export to Neo4j (with upsert logic)

Prerequisites

  • Install Neo4j and start it locally
    Default local browser: http://localhost:7474
    Default credentials used in this example: username neo4j, password cocoindex
  • Configure your OpenAI API key
  • Prepare Google Drive:
    Create a Google Cloud service account and download its JSON credential
    Share the source folders with the service account email
    Collect the root folder IDs you want to ingest
  • See Setup for Google Drive for details

Environment

Set the following environment variables:

export OPENAI_API_KEY=sk-...
export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json
export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2

Notes:

  • GOOGLE_DRIVE_ROOT_FOLDER_IDS accepts a comma-separated list of folder IDs
  • The flow polls recent changes and refreshes periodically

Let’s break down each component:

Flow Definition

Overview

Add source and collector

@cocoindex.flow_def(name="MeetingNotesGraph")
def meeting_notes_graph_flow(
flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
) -> None:
"""
Define an example flow that extracts triples from files and builds knowledge graph.
"""

credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"]
root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",")
data_scope["documents"] = flow_builder.add_source(
cocoindex.sources.GoogleDrive(
service_account_credential_path=credential_path,
root_folder_ids=root_folder_ids,
recent_changes_poll_interval=datetime.timedelta(seconds=10),
),
refresh_interval=datetime.timedelta(minutes=1),
)

The pipeline starts by connecting to Google Drive using a service account. CocoIndex’s built-in source connector handles authentication and provides incremental change detection. The recent_changes_poll_interval parameter means the source checks for new or modified files every 10 seconds, while the refresh_interval determines when the entire flow re-runs (every minute).

This is one of CocoIndex’s superpowers: incremental processing with automatic change tracking. Instead of reprocessing all documents on every run, the framework:

  1. Lists files from Google Drive with last modified time
  2. Identifies only the files that have been added or modified since the last successful run
  3. Skips unchanged files entirely
  4. Passes only changed documents downstream

The result? In an enterprise with 1% daily churn, only 1% of documents trigger downstream processing. Unchanged files never hit your LLM API, never generate Neo4j queries, and never consume compute resources.

Add collector

meeting_nodes = data_scope.add_collector()
attended_rels = data_scope.add_collector()
decided_tasks_rels = data_scope.add_collector()
assigned_rels = data_scope.add_collector()

The pipeline then collects data into specialized collectors for different entity types and relationships:

  • Meeting Nodes — Store the meeting itself with its date and notes
  • Attendance Relationships — Capture who attended meetings and whether they were the organizer
  • Task Decision Relationships — Link meetings to decisions (tasks that were decided upon)
  • Task Assignment Relationships — Assign specific tasks to people

Process each document

Extract meetings

with data_scope["documents"].row() as document:
document["meetings"] = document["content"].transform(
cocoindex.functions.SplitBySeparators(
separators_regex=[r"\n\n##?\ "], keep_separator="RIGHT"
)
)

Meeting documents often contain multiple meetings in a single file. This step splits documents on Markdown headers (## or #) preceded by blank lines, treating each section as a separate meeting. The keep_separator="RIGHT" means the separator (header) is kept with the right segment, preserving context.

Extract meeting

Define Meeting schema

@dataclass
class Person:
name: str
@dataclass
class Task:
description: str
assigned_to: list[Person]
@dataclass
class Meeting:
time: datetime.date
note: str
organizer: Person
participants: list[Person]
tasks: list[Task]

This provides direct guidance for the LLM about what information to extract and their schema. This is far more reliable than asking an LLM to generate free-form output, from which we cannot get structured information to build a knowledge graph.

Extract and collect relationship

with document["meetings"].row() as meeting:
parsed = meeting["parsed"] = meeting["text"].transform(
cocoindex.functions.ExtractByLlm(
llm_spec=cocoindex.LlmSpec(
api_type=cocoindex.LlmApiType.OPENAI, model="gpt-5"
),
output_type=Meeting,
)
)

Importantly, this step also benefits from incremental processing. Since ExtractByLlm is a heavy step, we keep the output in cache, and as long as inputs (input data text, model, output type definition) have no change, we reuse the cached output without re-running the LLM.

Collect relationship

meeting_key = {"note_file": document["filename"], "time": parsed["time"]}
meeting_nodes.collect(**meeting_key, note=parsed["note"])
attended_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
person=parsed["organizer"]["name"],
is_organizer=True,
)
with parsed["participants"].row() as participant:
attended_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
person=participant["name"],
)
with parsed["tasks"].row() as task:
decided_tasks_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
description=task["description"],
)
with task["assigned_to"].row() as assigned_to:
assigned_rels.collect(
id=cocoindex.GeneratedField.UUID,
**meeting_key,
task=task["description"],
person=assigned_to["name"],
)

Collectors in CocoIndex act like in‑memory buffers: you declare collectors for different categories (meeting nodes, attendance, tasks, assignments), then as you process each document you “collect” relevant entries.

This block collects nodes and relationships from parsed meeting notes to build a knowledge graph in Neo4j using CocoIndex:

  • Person → Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended.
  • Meeting → Task (DECIDED) Links meetings to tasks or decisions that were made.
  • Person → Task (ASSIGNED_TO) Links tasks back to the people responsible for them.

Map to graph database

Overview

We will be creating a property graph with following nodes and relationships: To learn more about property graph, please refer to CocoIndex's Property Graph Targets documentation.

Map Meeting Nodes

meeting notes
meeting_nodes.export(
"meeting_nodes",
cocoindex.targets.Neo4j(
connection=conn_spec, mapping=cocoindex.targets.Nodes(label="Meeting")
),
primary_key_fields=["note_file", "time"],
)

Declare Person and Task Nodes

flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Person",
primary_key_fields=["name"],
)
)
flow_builder.declare(
cocoindex.targets.Neo4jDeclaration(
connection=conn_spec,
nodes_label="Task",
primary_key_fields=["description"],
)
)

Map ATTENDED Relationship

ATTENDED relationships

attended_rels.export(
"attended_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="ATTENDED",
source=cocoindex.targets.NodeFromFields(
label="Person",
fields=[
cocoindex.targets.TargetFieldMapping(
source="person", target="name"
)
],
),
target=cocoindex.targets.NodeFromFields(
label="Meeting",
fields=[
cocoindex.targets.TargetFieldMapping("note_file"),
cocoindex.targets.TargetFieldMapping("time"),
],
),
),
),
primary_key_fields=["id"],
)
  • This call ensures that ATTENDED relationships — i.e. “Person → Meeting” (organizer or participant → the meeting) — are explicitly encoded as edges in the Neo4j graph.
  • It links Person nodes with Meeting nodes via ATTENDED relationships, enabling queries like “which meetings did Alice attend?” or “who attended meeting X?”.
  • By mapping Person and Meeting nodes correctly and consistently (using unique keys), it ensures a clean graph with no duplicate persons or meetings.
  • Because relationships get unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-runs won’t duplicate edges or nodes.

Map DECIDED Relationship

DECIDED relationships

decided_tasks_rels.export(
"decided_tasks_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="DECIDED",
source=cocoindex.targets.NodeFromFields(
label="Meeting",
fields=[
cocoindex.targets.TargetFieldMapping("note_file"),
cocoindex.targets.TargetFieldMapping("time"),
],
),
target=cocoindex.targets.NodeFromFields(
label="Task",
fields=[
cocoindex.targets.TargetFieldMapping("description"),
],
),
),
),
primary_key_fields=["id"],
)
  • This call ensures that DECIDED relationships — i.e., “Meeting → Task” — are explicitly encoded as edges in the Neo4j graph.
  • It links Meeting nodes with Task nodes via DECIDED relationships, enabling queries like:
  • “Which tasks were decided in Meeting X?”
  • “From which meeting did Task Y originate?”
  • By mapping Meeting and Task nodes consistently (using note_file + time for meetings and description for tasks), it prevents duplicate tasks or meeting nodes in the graph.
  • Because relationships have unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-running the pipeline won’t create duplicate edges or nodes.

Map ASSIGNED_TO Relationship

ASSIGNED_TO relationships

assigned_rels.export(
"assigned_rels",
cocoindex.targets.Neo4j(
connection=conn_spec,
mapping=cocoindex.targets.Relationships(
rel_type="ASSIGNED_TO",
source=cocoindex.targets.NodeFromFields(
label="Person",
fields=[
cocoindex.targets.TargetFieldMapping(
source="person", target="name"
),
],
),
target=cocoindex.targets.NodeFromFields(
label="Task",
fields=[
cocoindex.targets.TargetFieldMapping(
source="task", target="description"
),
],
),
),
),
primary_key_fields=["id"],
)

The Resulting Graph

After running this pipeline, your Neo4j database contains a rich, queryable graph:

Nodes:

  • Meeting – Represents individual meetings with properties like date and notes
  • Person – Represents individuals involved in meetings
  • Task – Represents actionable items decided in meetings

Relationships:

  • ATTENDED – Connects people to meetings they attended
  • DECIDED – Connects meetings to tasks that were decided
  • ASSIGNED_TO – Connects people to tasks they're responsible for

Importantly, in the final step to export to the knowledge graph, CocoIndex also does this incrementally. CocoIndex only mutates the knowledge graph for nodes or relationships that have changes, and it’s a no-op for unchanged stuff. This avoids unnecessary churning on the target database and minimizes the cost of target write operations.

Run

Build/update the graph

Install dependencies:

pip install -e .

Update the index (run the flow once to build/update the graph):

cocoindex update main

Browse the knowledge graph

Open Neo4j Browser at http://localhost:7474.

Sample Cypher queries:

// All relationships
MATCH p=()-->() RETURN p
// Who attended which meetings (including organizer)
MATCH (p:Person)-[:ATTENDED]->(m:Meeting)
RETURN p, m
// Tasks decided in meetings
MATCH (m:Meeting)-[:DECIDED]->(t:Task)
RETURN m, t
// Task assignments
MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task)
RETURN p, t

Real-World Enterprise Applications

This pattern extends far beyond meeting notes:

  • Research Paper Analysis — Extract papers from organizational repositories, build knowledge graphs of concepts and citations across thousands of documents, and track updates to citations and concepts
  • Customer Support Tickets — Extract issues, solutions, and relationships between tickets and customers; identify patterns across thousands of tickets while handling frequent edits and status updates
  • Email Thread Summarization — Build graphs of communication patterns and decision outcomes across millions of emails; handle the reality that teams forward, edit, and reference previous discussions
  • Compliance Documentation — Extract regulatory requirements from policy documents; track changes to policies and cascade impacts through a graph structure; maintain audit trails of document versions
  • Competitive Intelligence — Extract data from public documents and news articles; build knowledge graphs of competitor relationships, products, and market positioning while handling constant updates

If this example was helpful, the easiest way to support CocoIndex is to give the project a ⭐ on GitHub.

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Towards AI Academy

We Build Enterprise-Grade AI. We'll Teach You to Master It Too.

15 engineers. 100,000+ students. Towards AI Academy teaches what actually survives production.

Start free — no commitment:

6-Day Agentic AI Engineering Email Guide — one practical lesson per day

Agents Architecture Cheatsheet — 3 years of architecture decisions in 6 pages

Our courses:

AI Engineering Certification — 90+ lessons from project selection to deployed product. The most comprehensive practical LLM course out there.

Agent Engineering Course — Hands on with production agent architectures, memory, routing, and eval frameworks — built from real enterprise engagements.

AI for Work — Understand, evaluate, and apply AI for complex work tasks.

Note: Article content contains the views of the contributing authors and not Towards AI.