Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: pub@towardsai.net
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab VeloxTrend Ultrarix Capital Partners Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Our 15 AI experts built the most comprehensive, practical, 90+ lesson courses to master AI Engineering - we have pathways for any experience at Towards AI Academy. Cohorts still open - use COHORT10 for 10% off.

Publication

Take a Dive Into Delta Lake
Data Engineering   Latest   Machine Learning

Take a Dive Into Delta Lake

Author(s): Disha Verma

Originally published on Towards AI.

Take a Dive Into Delta Lake
That’s Jerry — the frustrated Data Steward!

Remember the time we spoke about Data Warehouse, Data Lake and Data Lakehouse? Today, we will learn about Delta Lake that belongs to the same data architecture family. A team at Databricks came up with the idea of a fast storage layer built on top of data lakes. Organizations already using data lakes loved the concept of “Delta Lake,” which could handle massive data loads efficiently — often processing them in just a few minutes.

Disha

Hieeee!!!! I am here to learn about you today! Tell me some interesting facts that I can share with my friends here.

Additionally, I am really confused as to why organizations opted for you when they had Data Lake?

From Canva

Delta Lake

Hey Disha! Happy to meet you and talk about, well, me! Let’s begin with my inception. Please don’t mind Mr. Data Steward, he’s been really overwhelmed with so many data terms and frameworks.

Why did Delta Lake come into being?

Michael Armbrust (Databricks employee) came up with an idea of creating me so that there’s an efficient transaction with large volume of data — too technical? Let me simplify it for you.

Imagine like you imagined during data lake — you have a huge dump of files, CDs, images, documents to store somewhere.

Now, data lake was already handling these records, but, Delta Lake provided ACID compliance (explained below) to your records. Secondly, it made processing of the records 10 times faster than a data lake. And third, companies like Apple were able to process 300 billion records every day.

Isn’t that great?!

Delta Lake — The Definitive Guide by Databricks

Let’s discuss some of the benefits of using me over Data Lake:

Benefit #1: ACID Compliance

Before we dive deeper, let me explain you ACID (Atomicity, Consistency, Isolation, Durability) compliance in layperson language. This term plays a vital role in understanding how I was invented:

A — Atomicity

Let’s say you’re running a 10K marathon. You’re halfway through when you injure your knee and have to stop.

To earn a medal, you must cross the finish line. But in this case — you don’t.

From Canva

That’s Atomicity. You either finish the race and get the reward, or you don’t.
No partial credit. No “almost there” badge.

In databases, it’s the same idea:
A transaction is either fully complete, or not done at all.
There’s no such thing as half a transaction.

C — Consistency

Let’s consider another example in healthcare domain. A patient has undergone a surgery and as per hospital rules the following checklist should be finished before they can be discharged:

  • Final test reports
  • List of prescribed medicines
  • Billing

However, when the patient is discharged, their billing could not finish due to system glitches — an incomplete state.

In database world, ‘Consistency’ reflects that a transaction must not leave in an incomplete or inconsistent state when leaving the source system and moving to the target system.

I — Isolation

Whenever you visit a grocery store, customers line up one after another for their items to be checked out. Now imagine, while the cashier is checking out your items, they suddenly start scanning items from the next customer’s cart too. This would cause so much chaos!!

From Canva

Isolation ensures each customer’s items are scanned individually and ONLY one customer is handled at a time. Similarly in database, if multiple transactions occur, each one is processed completely before the next one begins.

D — Durability

It’s late at night and you’re binge-watching a show on Netflix. You’re tired, so you hit pause and head to bed. The next day, you open Netflix again — and the episode resumes exactly where you left off.

Even if you turned off the TV, lost power, or closed the app, your spot was saved.

This is Durability!!

It’s the same principle in the database world— Once a transaction is completed, it must be saved permanently, no matter what — power failure, system crash, or database outage.

ACID compliance guarantees that your data won’t just disappear. If it’s saved, it stays saved — just like your Netflix progress.

Benefit #2: Schema Enforcement

I hope you remember we talked briefly about schema-on-read (SoR) and schema-on-write (SoW) operations in the Data Lake blog. While Data Lake follows SoR, I allow SoW (similar to a Data Warehouse).

For reference, Data Lake explanation about SoR and SoW in Blog #1

That’s why schema enforcement and ACID compliance work so well together. If there’s no check on the structure of incoming data, incorrect or mismatched data can slip in — and that breaks the rules ACID is supposed to protect.

From Canva

Benefit #3: Time travel

Have you ever visited DMV to get your driver’s license?

You may have encountered situations where your license expires, you visit DMV for a new one and they discard your old license since its not valid anymore. However, do you know that DMV keeps a track of all your licenses even though they’ve discarded it for you?

This is the other benefit I offer — Time Travel. Even if you’re using a table that is updated and even if your older tables have been replaced, I always have a copy of the older versions (historical data).

This way you can travel back in time and look at the previous versions of your table.

To conclude, ACID compliance, schema enforcement and time travel are few important reasons why Delta Lake was created. There are many other benefits, but covering all of them in one blog isn’t possible.

Disha:

So where do these benefits actually live?

Delta Lake:

All the powerful features we just discussed — ACID compliance, schema enforcement, and time travel — are made possible through a Delta Table, the building block of Delta Lake.

Delta Lake Table (or Delta table)

Delta Lake contains a table similar to a database table called Delta Lake Table but is stored in a Delta Lake format.

Delta Lake format => ACID + Schema Enforcement + Time Travel (+ two more benefits that we will explore in an upcoming blog).

Every table in a Databricks platform is a delta table. In addition to the various benefits described above, it utilizes parquet format (most efficient format) to handle large amounts of data.

A parquet format file follows a columnar based approach. To better understand this concept, check out Reference section. Here’s a sample parquet file:

From https://data-mozart.com/parquet-file-format-everything-you-need-to-know/

Let us understand Delta Live Table analogy below and then the concept of Delta Table will make better sense.

Delta Live Table

Disha:

This is all too confusing — Delta Table, Delta Live Table :(…

Delta Lake:

That is understandable! Even Mr. Jerry was agitated, more than he usually is, when he learnt about this term (Delta Live). Let me simplify this one for you!

In a restaurant…

Imagine you walk into a restaurant as the very first customer. As soon as you enter, the entire staff gets into action — someone brings you a glass of water, another takes your order and chef in the kitchen starts prepping your meal.

In this setup, the restaurant head is the Delta Live Table as she gets everyone to work, makes sure everything runs smoothly and on time. You are the data here that is being handled. Additionally, Delta Table mentioned above is the meal prepared by the chef for you.

Delta Live table is an ETL pipeline that triggers as soon as there’s a file ready for ingestion.

For people who have been working in data field for a while — think of delta live tables as ADF (Azure Data Factory). The only difference — while ADF is Azure-native, Delta Live is Spark-native and works only with Databricks.

Additionally, delta live tables handle only data pipelines on Delta Lake where as ADF can be used for a general purpose pipeline.

There are a few more interesting topics like batch vs. stream processing, auto loader, data lineage, data swamps and more. I’ll be covering those in next blog soon — stay tuned!

References

  1. Interesting Read:

Delta Lake vs Data Lake – What's the Difference?

Understand the difference between Delta Lake and a data lake

delta.io

2. Understanding Parquet in Detail:

Parquet file format – everything you need to know! – Data Mozart

New data flavors require new ways for storing it! Learn everything you need to know about the Parquet file format

data-mozart.com

3. Difference between Delta Table and Delta Live Table:

Delta Live Table 101-Streamline Your Data Pipeline (2025)

Databricks Delta Live Tables simplify data pipeline development through incremental, reliable data processing. Learn…

www.chaosgenius.io

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI


Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!


Discover Your Dream AI Career at Towards AI Jobs

Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!

Note: Content contains the views of the contributing authors and not Towards AI.