K-Means From Scratch: How The Cluster Magic Works
Last Updated on May 9, 2024 by Editorial Team
Author(s): Francis Adrian Viernes
Originally published on Towards AI.
Reverse Engineering The SciKit Implementation
Photo by Mel Poole on Unsplash
Understanding how an algorithm works is interesting as it provides some insights into why an implementation may not be as one would expect. It likewise provides an opportunity for customization to fit the unique setup of datasets, including the addition of conditionals.
In my pursuit of understanding the concepts behind our popular algorithms, the best way to confirm success is to validate through the output provided by popular packages.
This is not always easy to do as some algorithms have stochastic components. Some algorithms, however, are deterministic and will produce identical answers with packages, provided they are coded correctly.
For example, in my implementation of the simplex code, we get the same answer from the Python scratch implementation and the one from the package.
Likewise, K-means is an algorithm with a stochastic component, particularly in its initialization. Letβs dive deeper into the algorithm.
K-means is probably one of the most clustering algorithms out there. In a nutshell, what K-means does to produce its clusters is to find the centers of data, called as centroids, and assign data points to the center where they are closest.
This involves several major steps: Initialization of Centroids, Calculation of Distances and Cluster Assignment, Updating the… Read the full blog for free on Medium.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI