DBSCAN Clustering Demystified: A Visual Walkthrough

Last Updated on September 25, 2025 by Editorial Team

Author(s): Niraj

Originally published on Towards AI.

DBSCAN Clustering Demystified: A Visual Walkthrough

If you’ve ever tried to cluster data with varying densities or irregular shapes, you’ve likely discovered that traditional algorithms like K-Means fall short. In my previous article, Beyond Accuracy: A Guide to Classification Metrics, we explored how to evaluate models beyond simple accuracy. Today, we’re diving into a powerful clustering technique that doesn’t require specifying the number of clusters beforehand: DBSCAN.

What Makes DBSCAN Special?

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) stands out from other clustering algorithms in several key ways:

No preset cluster count: Unlike K-Means, you don’t need to specify the number of clusters
Handles irregular shapes: Can find clusters of arbitrary shapes
Identifies noise: Naturally separates outliers from meaningful clusters
Density-based: Finds areas of high density separated by areas of low density

But how does it actually work? Let’s break it down with a hands-on implementation.

The Core Concepts: Eps and MinPts

DBSCAN operates on two simple parameters:

Eps (ε): The radius that defines the neighborhood around each point
MinPts: The minimum number of points required to form a dense region

Using these parameters, DBSCAN classifies points into three categories:

Core points: Points with at least MinPts neighbors within their ε-radius
Border points: Points that are reachable from core points but don’t have enough neighbors themselves
Noise points: Points that are neither core nor border points

Walking Through DBSCAN Step by Step

Let’s implement a simplified version of DBSCAN with detailed explanations at each step. This will help us understand exactly what’s happening behind the scenes:

Lets first do it with our own hand using some basic maths :

🎯 DBSCAN ALGORITHM - SIMPLE WALKTHROUGH
==================================================
📍 Our data points:
 P0: [1 2]
 P1: [2 2]
 P2: [2 3]
 P3: [8 7]
 P4: [8 8]
 P5: [9 8]
 P6: [5 1]

⚙️ Settings: eps=1.5, min_pts=3
🔍 Starting point-by-point analysis...

👀 Looking at P0 [1 2]:
 Checking distances (need ≤ 1.5):
 P0: distance = 0.00 🎯
 P1: distance = 1.00 ✅
 P2: distance = 1.41 ✅
 P3: distance = 7.81 ❌
 P4: distance = 8.06 ❌
 P5: distance = 8.94 ❌
 P6: distance = 4.12 ❌
 → Found 3 neighbors: [0, 1, 2]
 → P0 is a CORE POINT! Starting Cluster 0
 Added P0 to Cluster 0
 Added P1 to Cluster 0
 Added P2 to Cluster 0
 ✅ Cluster 0 created!

👀 Looking at P1 [2 2]:
 Already in cluster 0 - SKIP

👀 Looking at P2 [2 3]:
 Already in cluster 0 - SKIP

👀 Looking at P3 [8 7]:
 Checking distances (need ≤ 1.5):
 P0: distance = 7.81 ❌
 P1: distance = 6.08 ❌
 P2: distance = 6.32 ❌
 P3: distance = 0.00 🎯
 P4: distance = 1.00 ✅
 P5: distance = 1.41 ✅
 P6: distance = 6.08 ❌
 → Found 3 neighbors: [3, 4, 5]
 → P3 is a CORE POINT! Starting Cluster 1
 Added P3 to Cluster 1
 Added P4 to Cluster 1
 Added P5 to Cluster 1
 ✅ Cluster 1 created!

👀 Looking at P4 [8 8]:
 Already in cluster 1 - SKIP

👀 Looking at P5 [9 8]:
 Already in cluster 1 - SKIP

👀 Looking at P6 [5 1]:
 Checking distances (need ≤ 1.5):
 P0: distance = 4.12 ❌
 P1: distance = 3.16 ❌
 P2: distance = 3.61 ❌
 P3: distance = 6.08 ❌
 P4: distance = 5.83 ❌
 P5: distance = 5.00 ❌
 P6: distance = 0.00 🎯
 → Found 1 neighbors: [6]
 → Not enough neighbors (1 < 3)
 → P6 is NOISE (for now)

🎉 FINAL RESULTS:
==============================
P0 [1 2] → CLUSTER 0
P1 [2 2] → CLUSTER 0
P2 [2 3] → CLUSTER 0
P3 [8 7] → CLUSTER 1
P4 [8 8] → CLUSTER 1
P5 [9 8] → CLUSTER 1
P6 [5 1] → NOISE

What Just Happened?

Let’s break down the algorithm’s decision process:

Point P0 had 3 neighbors (including itself), meeting the min_pts threshold of 3, so it became a core point and formed Cluster 0.
Points P1 and P2 were within P0’s ε-radius, so they were added to Cluster 0 as border points.
Point P3 had 3 neighbors, forming Cluster 1.
Points P4 and P5 were within P3’s ε-radius, joining Cluster 1.
Point P6 had only itself in its neighborhood, so it was classified as noise.

The algorithm successfully identified two dense clusters and separated the outlier point, all without being told how many clusters to look for!

Choosing the Right Parameters

As with any algorithm, parameter selection is crucial for DBSCAN:

Too small ε: Everything becomes noise
Too large ε: Everything merges into one cluster
Too high min_pts: Many points marked as noise
Too low min_pts: False clusters in sparse regions

A good rule of thumb is to set min_pts to twice the dimensionality of your dataset (but not less than 3). For ε, the k-distance graph method (plotting distance to the k-th nearest neighbor) often works well.

Real-World Applications

DBSCAN shines in scenarios where:

Anomaly detection: Identifying fraudulent transactions or network intrusions
Spatial data analysis: Finding geographical clusters of events
Customer segmentation: Grouping similar purchasing behaviors
Image processing: Identifying objects or regions in images

Limitations to Consider

While powerful, DBSCAN has some limitations:

Struggles with varying densities: If clusters have different densities, a single ε may not work for all
Sensitive to parameters: Poor parameter choices can drastically affect results
Not completely deterministic: Border points might be assigned to different clusters depending on processing order

Beyond the Basics

For more advanced applications, consider these DBSCAN variants:

HDBSCAN: Hierarchical version that handles varying densities better
OPTICS: Creates a reachability plot that doesn’t require precise ε setting
DENCLUE: Uses density functions for more mathematical rigor

Key Takeaways

DBSCAN is a powerful density-based clustering algorithm that doesn’t require specifying the number of clusters beforehand.
It naturally handles noise and outliers, making it robust for real-world data.
The algorithm identifies core points, border points, and noise based on local density.
Parameter selection (ε and min_pts) is crucial and often requires domain knowledge.
While it has limitations with varying densities, it’s excellent for many practical applications.

Just as we discussed in my previous article on classification metrics, understanding the mechanics behind our algorithms helps us make better decisions about when and how to use them. DBSCAN’s intuitive approach to finding natural clusters in data makes it a valuable addition to any data scientist’s toolkit.

Have you used DBSCAN in your projects? Share your experiences and tips in the comments below!

Further reading:

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

DBSCAN Clustering Demystified: A Visual Walkthrough

Author(s): Niraj

What Makes DBSCAN Special?

The Core Concepts: Eps and MinPts

Walking Through DBSCAN Step by Step

What Just Happened?

Choosing the Right Parameters

Real-World Applications

Limitations to Consider

Beyond the Basics

Key Takeaways

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

Understanding Neural Networks — and Building One!

LLMs Don’t Just Need to Be Smart — They Need to Be Specific. Here’s How.

TAI #171: How is AI Actually Being Used? Frontier Ambitions Meet Real-World Adoption Data

I Built a Clinical AI Agent — and It Skipped the Tools I Gave It

ATOKEN: A Unified Tokenizer for Vision Finally Solves AI’s Biggest Problem

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

DBSCAN Clustering Demystified: A Visual Walkthrough

Author(s): Niraj

What Makes DBSCAN Special?

The Core Concepts: Eps and MinPts

Walking Through DBSCAN Step by Step

What Just Happened?

Choosing the Right Parameters

Real-World Applications

Limitations to Consider

Beyond the Basics

Key Takeaways

Related posts

Popular posts

Updates

Recent Posts

Comprehensive AI Engineering and AI for Work certifications

Company

CONTACT US

GDPR CCPA Statement