
DBSCAN Clustering Demystified: A Visual Walkthrough
Last Updated on September 25, 2025 by Editorial Team
Author(s): Niraj
Originally published on Towards AI.

If you’ve ever tried to cluster data with varying densities or irregular shapes, you’ve likely discovered that traditional algorithms like K-Means fall short. In my previous article, Beyond Accuracy: A Guide to Classification Metrics, we explored how to evaluate models beyond simple accuracy. Today, we’re diving into a powerful clustering technique that doesn’t require specifying the number of clusters beforehand: DBSCAN.
What Makes DBSCAN Special?
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) stands out from other clustering algorithms in several key ways:
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) stands out from other clustering algorithms in several key ways:
- No preset cluster count: Unlike K-Means, you don’t need to specify the number of clusters
- Handles irregular shapes: Can find clusters of arbitrary shapes
- Identifies noise: Naturally separates outliers from meaningful clusters
- Density-based: Finds areas of high density separated by areas of low density
But how does it actually work? Let’s break it down with a hands-on implementation.
The Core Concepts: Eps and MinPts
DBSCAN operates on two simple parameters:
- Eps (ε): The radius that defines the neighborhood around each point
- MinPts: The minimum number of points required to form a dense region
Using these parameters, DBSCAN classifies points into three categories:
- Core points: Points with at least MinPts neighbors within their ε-radius
- Border points: Points that are reachable from core points but don’t have enough neighbors themselves
- Noise points: Points that are neither core nor border points
Walking Through DBSCAN Step by Step
Let’s implement a simplified version of DBSCAN with detailed explanations at each step. This will help us understand exactly what’s happening behind the scenes:
Lets first do it with our own hand using some basic maths :
🎯 DBSCAN ALGORITHM - SIMPLE WALKTHROUGH
==================================================
📍 Our data points:
P0: [1 2]
P1: [2 2]
P2: [2 3]
P3: [8 7]
P4: [8 8]
P5: [9 8]
P6: [5 1]
⚙️ Settings: eps=1.5, min_pts=3
🔍 Starting point-by-point analysis...
👀 Looking at P0 [1 2]:
Checking distances (need ≤ 1.5):
P0: distance = 0.00 🎯
P1: distance = 1.00 ✅
P2: distance = 1.41 ✅
P3: distance = 7.81 ❌
P4: distance = 8.06 ❌
P5: distance = 8.94 ❌
P6: distance = 4.12 ❌
→ Found 3 neighbors: [0, 1, 2]
→ P0 is a CORE POINT! Starting Cluster 0
Added P0 to Cluster 0
Added P1 to Cluster 0
Added P2 to Cluster 0
✅ Cluster 0 created!
👀 Looking at P1 [2 2]:
Already in cluster 0 - SKIP
👀 Looking at P2 [2 3]:
Already in cluster 0 - SKIP
👀 Looking at P3 [8 7]:
Checking distances (need ≤ 1.5):
P0: distance = 7.81 ❌
P1: distance = 6.08 ❌
P2: distance = 6.32 ❌
P3: distance = 0.00 🎯
P4: distance = 1.00 ✅
P5: distance = 1.41 ✅
P6: distance = 6.08 ❌
→ Found 3 neighbors: [3, 4, 5]
→ P3 is a CORE POINT! Starting Cluster 1
Added P3 to Cluster 1
Added P4 to Cluster 1
Added P5 to Cluster 1
✅ Cluster 1 created!
👀 Looking at P4 [8 8]:
Already in cluster 1 - SKIP
👀 Looking at P5 [9 8]:
Already in cluster 1 - SKIP
👀 Looking at P6 [5 1]:
Checking distances (need ≤ 1.5):
P0: distance = 4.12 ❌
P1: distance = 3.16 ❌
P2: distance = 3.61 ❌
P3: distance = 6.08 ❌
P4: distance = 5.83 ❌
P5: distance = 5.00 ❌
P6: distance = 0.00 🎯
→ Found 1 neighbors: [6]
→ Not enough neighbors (1 < 3)
→ P6 is NOISE (for now)
🎉 FINAL RESULTS:
==============================
P0 [1 2] → CLUSTER 0
P1 [2 2] → CLUSTER 0
P2 [2 3] → CLUSTER 0
P3 [8 7] → CLUSTER 1
P4 [8 8] → CLUSTER 1
P5 [9 8] → CLUSTER 1
P6 [5 1] → NOISE
What Just Happened?
Let’s break down the algorithm’s decision process:
- Point P0 had 3 neighbors (including itself), meeting the min_pts threshold of 3, so it became a core point and formed Cluster 0.
- Points P1 and P2 were within P0’s ε-radius, so they were added to Cluster 0 as border points.
- Point P3 had 3 neighbors, forming Cluster 1.
- Points P4 and P5 were within P3’s ε-radius, joining Cluster 1.
- Point P6 had only itself in its neighborhood, so it was classified as noise.
The algorithm successfully identified two dense clusters and separated the outlier point, all without being told how many clusters to look for!
Choosing the Right Parameters
As with any algorithm, parameter selection is crucial for DBSCAN:
- Too small ε: Everything becomes noise
- Too large ε: Everything merges into one cluster
- Too high min_pts: Many points marked as noise
- Too low min_pts: False clusters in sparse regions
A good rule of thumb is to set min_pts to twice the dimensionality of your dataset (but not less than 3). For ε, the k-distance graph method (plotting distance to the k-th nearest neighbor) often works well.
Real-World Applications
DBSCAN shines in scenarios where:
- Anomaly detection: Identifying fraudulent transactions or network intrusions
- Spatial data analysis: Finding geographical clusters of events
- Customer segmentation: Grouping similar purchasing behaviors
- Image processing: Identifying objects or regions in images
Limitations to Consider
While powerful, DBSCAN has some limitations:
- Struggles with varying densities: If clusters have different densities, a single ε may not work for all
- Sensitive to parameters: Poor parameter choices can drastically affect results
- Not completely deterministic: Border points might be assigned to different clusters depending on processing order
Beyond the Basics
For more advanced applications, consider these DBSCAN variants:
- HDBSCAN: Hierarchical version that handles varying densities better
- OPTICS: Creates a reachability plot that doesn’t require precise ε setting
- DENCLUE: Uses density functions for more mathematical rigor
Key Takeaways
- DBSCAN is a powerful density-based clustering algorithm that doesn’t require specifying the number of clusters beforehand.
- It naturally handles noise and outliers, making it robust for real-world data.
- The algorithm identifies core points, border points, and noise based on local density.
- Parameter selection (ε and min_pts) is crucial and often requires domain knowledge.
- While it has limitations with varying densities, it’s excellent for many practical applications.
Just as we discussed in my previous article on classification metrics, understanding the mechanics behind our algorithms helps us make better decisions about when and how to use them. DBSCAN’s intuitive approach to finding natural clusters in data makes it a valuable addition to any data scientist’s toolkit.
Have you used DBSCAN in your projects? Share your experiences and tips in the comments below!
Further reading:
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.
Published via Towards AI
Take our 90+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!
Towards AI has published Building LLMs for Production—our 470+ page guide to mastering LLMs with practical projects and expert insights!

Discover Your Dream AI Career at Towards AI Jobs
Towards AI has built a jobs board tailored specifically to Machine Learning and Data Science Jobs and Skills. Our software searches for live AI jobs each hour, labels and categorises them and makes them easily searchable. Explore over 40,000 live jobs today with Towards AI Jobs!
Note: Content contains the views of the contributing authors and not Towards AI.