The Data Stack for AI: RDBMS, Graph, and HTAP

Author(s): Vicky’s Notes

Originally published on Towards AI.

Earlier this year, I was working with an insurance client who was eager to adopt generative AI to improve customer engagement, but they kept hitting a wall. The AI models were fine; the real issue was the data infrastructure. They had transactional data stored in legacy systems, analytical workloads scattered across different warehouses, and no real-time pipeline to unify it all. What they needed wasn’t just better AI, but needed a better data foundation instead.

That experience wasn’t unique. I’ve seen the same pattern across projects: everyone’s excited about AI, but very few are prepared to feed it with the right data, in the right format, at the right time.

In my previous articles, I’ve explored how event-driven architectures are reshaping AI/ML pipelines, when to adopt tools like StreamSets for real-time data pipelines, and why continuous data pipeline observability is key to help you cut cost on bad data and build replicable data products using Databand. These topics all pointed to a bigger question I kept hearing: “What data stack should we actually use for AI?”

In the AI era, another foundational shift is underway: the convergence of OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) into Hybrid Transactional/Analytical Processing (HTAP) systems. This convergence, combined with the rise of graph databases, has evolved from a database trend into a core AI requirement. HTAP unites real-time operational data with large-scale analytics, while graph engines give AI the connected context it needs for reasoning and explainability.

That’s when it clicked for me. The architecture itself is evolving. We’re seeing a shift toward Hybrid Transactional/Analytical Processing (HTAP) systems, where real-time operations and large-scale analytics are no longer siloed. At the same time, graph databases are emerging as critical for AI explainability and reasoning by capturing how data points relate to each other.

So I decided to write this article to demystify the three essential data stacks I keep coming back to in modern AI projects. I’ll break down what each one is, when to use which, and why it matters:

Relational databases (trusted OLTP foundations)
Graph databases (relationship and context engines for AI)
Cloud-native HTAP platforms like Snowflake and Databricks (unifying OLTP + OLAP for real-time AI)

The Data Stack for AI: RDBMS, Graph, and HTAP — Source: Image by the author.

1. Relational Databases: Trusted OLTP Foundations for AI

Relational databases (RDBMS) like IBM Db2, PostgreSQL, and SQL Server remain foundational for structured, transactional workloads. These systems use rows, columns, and foreign keys to represent data and relationships. They are ideal for:

ACID compliance
High integrity in financial, ERP, and healthcare systems
Structured, normalized schemas that prevent drift in key entities

Example:

In a school system, if we wanted to answer the question: “What are the names of the students enrolled in Mr. Chen’s class?”, a relational database would go through several steps:

Step 1: Identify the teacher.

Step 2: Find class enrollments for that teacher.

Step 3: Join with the enrollments table to find all students in that class.

Step 4: Join again with the students table to get names.

Final Output:

As we can see, answering a relatively simple query requires multiple joins across multiple tables. As the number of relationships and query complexity increases, the computational cost and memory usage also increase. While relational databases are optimized to handle many of these operations efficiently, they can become bottlenecks in systems that require deeply connected, multi-hop queries.

2. Graph Databases: Relationship and Context Engines for AI

Graph databases (e.g., Neo4j, AWS Neptune, TigerGraph) store data as nodes and edges, enabling direct representation of real-world networks and powering relationship-aware insights such as knowledge graphs that feed AI models with rich context.

Why graphs matter for AI:

1) Better AI results (grounded and explainable).

Knowledge grounding for LLMs/RAG: Graphs provide explicit entities, relationships, and constraints that LLMs can reference to reduce hallucinations and support multi-hop reasoning.
Hybrid retrieval (vector + graph): Vectors find semantically similar content; graph traversals apply structure and rules (who-connected-to-what, within which context), improving precision and explainability.
Features for ML/GNNs: Centrality, communities, and paths become high-signal features for fraud, recommendations, and root-cause analysis.

2) Real-time relationship analytics alongside OLTP & OLAP.
As OLTP and OLAP converge (HTAP), graph adds a third axis: relationships in motion.

OLTP captures transactions.
OLAP aggregates trends.
Graph exposes how entities are connected right now (e.g., fraud rings spanning accounts/devices; supply-chain ripple effects), supporting low-latency operational decisions and agent workflows.

Example:

Imagine a movie recommendation system. If you want to answer the question: “What movies have been liked by users who also liked the same movies as Emma?”, a graph database handles this through a traversal of user–movie–user–movie connections.

Step 1: Start at the node for Emma.

Step 2: Traverse from Emma to the movies she likes.

Step 3: From those movies, find other users who also liked them.

Step 4: From those users, find additional movies they liked, excluding ones Emma has already seen.

Unlike relational databases, no JOINs are required, and performance remains high even when traversing several layers of connections.

Agentic AI in practice

Leading tech vendors are already applying them in agentic AI systems. For example, Microsoft’s GraphRAG has combined LLMs with graph databases to enhance reasoning and traceability. In security incident triage, GraphRAG agents query a knowledge graph of entities (users, devices, alerts, incidents) to generate context-aware summaries, identify root causes, and suggest remediation steps — all with explainable paths. Similarly, AWS integrates Neptune with Amazon Bedrock for hybrid vector+graph retrieval, enabling agents to answer complex business questions with both semantic understanding and strict relationship logic.

When to use graphs

You need multi-hop or relationship-heavy queries (fraud, KYC/AML, bill-of-materials, lineage).
You need explanations/provenance (“show the path”).
You’re building RAG/agents that must combine semantic recall with strict business logic.

When not to

Mostly flat, aggregate-heavy analytics → columnar OLAP tables are simpler/faster.
Simple key-value or single-table lookups → a relational/NoSQL store suffices.

Takeaway: Graphs don’t replace your RDBMS or OLAP platform; they complement them, giving AI and operational analytics the relationship context those systems don’t natively provide.

3. Cloud-Native HTAP Platforms: Databricks and Snowflake Embrace OLTP

While Snowflake originated as a managed data warehouse and Databricks as a data lake platform, both are now extending into OLTP territory to better serve HTAP workloads.

The convergence of HTAP and Lakehouse architectures

HTAP is not a new idea. Oracle, SQL Server, and IBM Db2 have supported hybrid workloads for nearly a decade. The term gained renewed industry traction with Google’s AlloyDB and Snowflake’s Unistore, which fueled modern data stack conversations. What’s different now is the convergence of HTAP and Lakehouse architectures, driven by AI. Both share the same goal: to eliminate the need for constant ETL between OLTP and OLAP systems, enabling real-time analytics on fresh, operational data.

This renewed HTAP push is no longer about saving architecture complexity, but more about feeding LLMs and AI agents with the freshest possible data, without waiting for pipeline delays or movement between systems.

Recent Moves by Databricks and Snowflake

In this May, Databricks announced its acquisition of Neon (a Postgres-based OLTP system) to natively run transactional workloads in its Lakehouse, unifying the data model for HTAP scenarios.
In addition to Databricks, this June Snowflake also announced enterprise-grade Postgres integration, built on technology from Crunchy Data, designed to run mission-critical transactional and AI workloads inside the Snowflake Data Cloud (Snowflake Blog). This enables Postgres OLTP tables to coexist and be queried alongside Snowflake’s analytical columnar tables under a unified governance and security model, effectively turning Snowflake into a multi-modal HTAP platform.

These moves signal a major shift: platforms are no longer just “warehouses” or “lakes”. They’re becoming multi-modal engines. OLTP and OLAP will co-exist in one system, drastically reducing data movement and latency. As a result, AI pipelines can ingest, process, and act on live data, minimizing the time between an event occurring and an AI system responding.

Looking Ahead: Beyond AI-Ready, Toward Data-Intelligent Architectures

As AI matures, so must the architectures that support it. The shift toward HTAP and graph-augmented data systems is not just about faster pipelines or lower latency — it reflects a broader move toward intelligent data architectures where AI and analytics no longer sit in silos.

Looking forward, the most forward-thinking data strategies will center around:

Unified intelligence: where OLTP, OLAP, and graph-native insights power not just reports, but real-time AI agents and decision systems.
Composable data fabrics: where purpose-built engines like vector stores, event streams, and graph DBs operate as part of an interoperable mesh.
Governed velocity: where pipelines are not only fast, but observable, explainable, and secure for building trust in AI decisions.
Data as a reusable asset: enabling teams to create replicable, AI-ready data products that serve many use cases across the enterprise.

HTAP and graph databases aren’t the end state. They’re enabling technologies that open the door to a more context-aware, explainable, and agentic AI future. The next wave of innovation will belong to teams who can bring these architectural building blocks together in service of resilient, intelligent data ecosystems.

References

What’s the Difference Between a Graph Database and a Relational Database? https://aws.amazon.com/compare/the-difference-between-graph-and-relational-database/
HTAP: Still the Dream, a Decade Later https://medium.com/@danthelion/htap-still-the-dream-a-decade-later-9d168f07c759
What is a data lake? https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

The Data Stack for AI: RDBMS, Graph, and HTAP

Author(s): Vicky’s Notes

1. Relational Databases: Trusted OLTP Foundations for AI

Example:

2. Graph Databases: Relationship and Context Engines for AI

Example:

3. Cloud-Native HTAP Platforms: Databricks and Snowflake Embrace OLTP

The convergence of HTAP and Lakehouse architectures

Recent Moves by Databricks and Snowflake

Looking Ahead: Beyond AI-Ready, Toward Data-Intelligent Architectures

References

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Why Knowledge Graphs Are the Missing Piece in AI Agent API Discovery

The Complexity of Self-Driving Cars Explained Simply

Bridging Symbolic AI and Deep Learning: How Knowledge Graphs are Revolutionizing ResNets

LAI #93: Smarter Model Choices, Multi-Agent Systems, and Cutting Through AI Noise

Who Wins Purview vs Rogue AI in Data Control

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

The Data Stack for AI: RDBMS, Graph, and HTAP

Author(s): Vicky’s Notes

1. Relational Databases: Trusted OLTP Foundations for AI

Example:

2. Graph Databases: Relationship and Context Engines for AI

Example:

3. Cloud-Native HTAP Platforms: Databricks and Snowflake Embrace OLTP

The convergence of HTAP and Lakehouse architectures

Recent Moves by Databricks and Snowflake

Looking Ahead: Beyond AI-Ready, Toward Data-Intelligent Architectures

References

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement