Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

The Data Stack for AI: RDBMS, Graph, and HTAP
Data Engineering   Latest   Machine Learning

The Data Stack for AI: RDBMS, Graph, and HTAP

Author(s): Vicky’s Notes

Originally published on Towards AI.

Earlier this year, I was working with an insurance client who was eager to adopt generative AI to improve customer engagement, but they kept hitting a wall. The AI models were fine; the real issue was the data infrastructure. They had transactional data stored in legacy systems, analytical workloads scattered across different warehouses, and no real-time pipeline to unify it all. What they needed wasn’t just better AI, but needed a better data foundation instead.

That experience wasn’t unique. I’ve seen the same pattern across projects: everyone’s excited about AI, but very few are prepared to feed it with the right data, in the right format, at the right time.

In my previous articles, I’ve explored how event-driven architectures are reshaping AI/ML pipelines, when to adopt tools like StreamSets for real-time data pipelines, and why continuous data pipeline observability is key to help you cut cost on bad data and build replicable data products using Databand. These topics all pointed to a bigger question I kept hearing: “What data stack should we actually use for AI?”

In the AI era, another foundational shift is underway: the convergence of OLTP (Online Transaction Processing) and OLAP (Online Analytical Processing) into Hybrid Transactional/Analytical Processing (HTAP) systems. This convergence, combined with the rise of graph databases, has evolved from a database trend into a core AI requirement. HTAP unites real-time operational data with large-scale analytics, while graph engines give AI the connected context it needs for reasoning and explainability.

That’s when it clicked for me. The architecture itself is evolving. We’re seeing a shift toward Hybrid Transactional/Analytical Processing (HTAP) systems, where real-time operations and large-scale analytics are no longer siloed. At the same time, graph databases are emerging as critical for AI explainability and reasoning by capturing how data points relate to each other.

So I decided to write this article to demystify the three essential data stacks I keep coming back to in modern AI projects. I’ll break down what each one is, when to use which, and why it matters:

  1. Relational databases (trusted OLTP foundations)
  2. Graph databases (relationship and context engines for AI)
  3. Cloud-native HTAP platforms like Snowflake and Databricks (unifying OLTP + OLAP for real-time AI)
The Data Stack for AI: RDBMS, Graph, and HTAP
Source: Image by the author.

1. Relational Databases: Trusted OLTP Foundations for AI

Relational databases (RDBMS) like IBM Db2, PostgreSQL, and SQL Server remain foundational for structured, transactional workloads. These systems use rows, columns, and foreign keys to represent data and relationships. They are ideal for:

  • ACID compliance
  • High integrity in financial, ERP, and healthcare systems
  • Structured, normalized schemas that prevent drift in key entities

Example:

In a school system, if we wanted to answer the question: “What are the names of the students enrolled in Mr. Chen’s class?”, a relational database would go through several steps:

Step 1: Identify the teacher.

Step 2: Find class enrollments for that teacher.

Step 3: Join with the enrollments table to find all students in that class.

Step 4: Join again with the students table to get names.

Final Output:

As we can see, answering a relatively simple query requires multiple joins across multiple tables. As the number of relationships and query complexity increases, the computational cost and memory usage also increase. While relational databases are optimized to handle many of these operations efficiently, they can become bottlenecks in systems that require deeply connected, multi-hop queries.

2. Graph Databases: Relationship and Context Engines for AI

Graph databases (e.g., Neo4j, AWS Neptune, TigerGraph) store data as nodes and edges, enabling direct representation of real-world networks and powering relationship-aware insights such as knowledge graphs that feed AI models with rich context.

AWS Graph Database

Why graphs matter for AI:

1) Better AI results (grounded and explainable).

  • Knowledge grounding for LLMs/RAG: Graphs provide explicit entities, relationships, and constraints that LLMs can reference to reduce hallucinations and support multi-hop reasoning.
  • Hybrid retrieval (vector + graph): Vectors find semantically similar content; graph traversals apply structure and rules (who-connected-to-what, within which context), improving precision and explainability.
  • Features for ML/GNNs: Centrality, communities, and paths become high-signal features for fraud, recommendations, and root-cause analysis.

2) Real-time relationship analytics alongside OLTP & OLAP.
As OLTP and OLAP converge (HTAP), graph adds a third axis: relationships in motion.

  • OLTP captures transactions.
  • OLAP aggregates trends.
  • Graph exposes how entities are connected right now (e.g., fraud rings spanning accounts/devices; supply-chain ripple effects), supporting low-latency operational decisions and agent workflows.

Example:

Imagine a movie recommendation system. If you want to answer the question: “What movies have been liked by users who also liked the same movies as Emma?”, a graph database handles this through a traversal of user–movie–user–movie connections.

Step 1: Start at the node for Emma.

Step 2: Traverse from Emma to the movies she likes.

Step 3: From those movies, find other users who also liked them.

Step 4: From those users, find additional movies they liked, excluding ones Emma has already seen.

Unlike relational databases, no JOINs are required, and performance remains high even when traversing several layers of connections.

Agentic AI in practice

Leading tech vendors are already applying them in agentic AI systems. For example, Microsoft’s GraphRAG has combined LLMs with graph databases to enhance reasoning and traceability. In security incident triage, GraphRAG agents query a knowledge graph of entities (users, devices, alerts, incidents) to generate context-aware summaries, identify root causes, and suggest remediation steps — all with explainable paths. Similarly, AWS integrates Neptune with Amazon Bedrock for hybrid vector+graph retrieval, enabling agents to answer complex business questions with both semantic understanding and strict relationship logic.

When to use graphs

  • You need multi-hop or relationship-heavy queries (fraud, KYC/AML, bill-of-materials, lineage).
  • You need explanations/provenance (“show the path”).
  • You’re building RAG/agents that must combine semantic recall with strict business logic.

When not to

  • Mostly flat, aggregate-heavy analytics → columnar OLAP tables are simpler/faster.
  • Simple key-value or single-table lookups → a relational/NoSQL store suffices.

Takeaway: Graphs don’t replace your RDBMS or OLAP platform; they complement them, giving AI and operational analytics the relationship context those systems don’t natively provide.

3. Cloud-Native HTAP Platforms: Databricks and Snowflake Embrace OLTP

While Snowflake originated as a managed data warehouse and Databricks as a data lake platform, both are now extending into OLTP territory to better serve HTAP workloads.

Dani Palma’s HTAP

The convergence of HTAP and Lakehouse architectures

HTAP is not a new idea. Oracle, SQL Server, and IBM Db2 have supported hybrid workloads for nearly a decade. The term gained renewed industry traction with Google’s AlloyDB and Snowflake’s Unistore, which fueled modern data stack conversations. What’s different now is the convergence of HTAP and Lakehouse architectures, driven by AI. Both share the same goal: to eliminate the need for constant ETL between OLTP and OLAP systems, enabling real-time analytics on fresh, operational data.

This renewed HTAP push is no longer about saving architecture complexity, but more about feeding LLMs and AI agents with the freshest possible data, without waiting for pipeline delays or movement between systems.

Recent Moves by Databricks and Snowflake

  • In this May, Databricks announced its acquisition of Neon (a Postgres-based OLTP system) to natively run transactional workloads in its Lakehouse, unifying the data model for HTAP scenarios.
  • In addition to Databricks, this June Snowflake also announced enterprise-grade Postgres integration, built on technology from Crunchy Data, designed to run mission-critical transactional and AI workloads inside the Snowflake Data Cloud (Snowflake Blog). This enables Postgres OLTP tables to coexist and be queried alongside Snowflake’s analytical columnar tables under a unified governance and security model, effectively turning Snowflake into a multi-modal HTAP platform.

These moves signal a major shift: platforms are no longer just “warehouses” or “lakes”. They’re becoming multi-modal engines. OLTP and OLAP will co-exist in one system, drastically reducing data movement and latency. As a result, AI pipelines can ingest, process, and act on live data, minimizing the time between an event occurring and an AI system responding.

Looking Ahead: Beyond AI-Ready, Toward Data-Intelligent Architectures

As AI matures, so must the architectures that support it. The shift toward HTAP and graph-augmented data systems is not just about faster pipelines or lower latency — it reflects a broader move toward intelligent data architectures where AI and analytics no longer sit in silos.

Looking forward, the most forward-thinking data strategies will center around:

  • Unified intelligence: where OLTP, OLAP, and graph-native insights power not just reports, but real-time AI agents and decision systems.
  • Composable data fabrics: where purpose-built engines like vector stores, event streams, and graph DBs operate as part of an interoperable mesh.
  • Governed velocity: where pipelines are not only fast, but observable, explainable, and secure for building trust in AI decisions.
  • Data as a reusable asset: enabling teams to create replicable, AI-ready data products that serve many use cases across the enterprise.

HTAP and graph databases aren’t the end state. They’re enabling technologies that open the door to a more context-aware, explainable, and agentic AI future. The next wave of innovation will belong to teams who can bring these architectural building blocks together in service of resilient, intelligent data ecosystems.

References

  1. What’s the Difference Between a Graph Database and a Relational Database? https://aws.amazon.com/compare/the-difference-between-graph-and-relational-database/
  2. HTAP: Still the Dream, a Decade Later https://medium.com/@danthelion/htap-still-the-dream-a-decade-later-9d168f07c759
  3. What is a data lake? https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.