Uncovering K-means Clustering for Spatial Analysis
Last Updated on August 6, 2024 by Editorial Team
Author(s): Stephen Chege-Tierra Insights
Originally published on Towards AI.
βDef- Underrated-adjective rated or valued too lowβ- Merriam Webster.
Underrated, unappreciated or underhyped are terms that get thrown around to suggest something that does not get the recognition it deserves. Sometimes it is used to describe someone who does not get the public attention he deserves despite being very effective in their profession, this could be a personβs biased opinion.
For example, I think that NBA basketballer Leonard Kawhi is the most underrated and criminally underhyped player of all time. Rapper Nathan John Feuerstein, also known as NF is highly underrated as both do not fit the perception of modern-day images of athletes and rappers.
The same could be said about some machine learning algorithms which are not talked about with excitement as they should be, as we are reaching the golden age of Artificial Intelligence and machine learning where some algorithms will be propped up while others may fall by the wayside of irrelevance due to this fact.
One such algorithm is K means which is known as an unsupervised algorithm and has become widely used but has not reached the popularity of random forest and K nearest- as I continue writing and researching on machine learning algorithms and their impact on the spatial sector- let us have a look at k means and what it offers to GIS pros.
What is K Means Clustering
K-Means is an unsupervised machine learning approach that divides the unlabeled dataset into various clusters. The purpose of this article is to examine the principles and operation of k-mean clustering as well as its application especially when it comes to geospatial analysis and its implication
Unsupervised machine learning algorithm as it is commonly referred to is the process of teaching an algorithm to work on unlabeled, unclassified data without human intervention. In this scenario, the machineβs task is to arrange unsorted data based on parallels, patterns, and variances without any prior data training.
K stands for clustering, which divides data points into K clusters based on how far apart they are from each otherβs centres. The cluster centroid in the space is first randomly assigned.
To process the learning data, the K-means algorithm in data mining starts with a first group of randomly selected centroids, which are used as the beginning points for every cluster, and then performs iterative (repetitive) calculations to optimize the positions of the centroids.
How it Works
A clusterβs centroid is a set of characteristic values that together define the groups that are formed. The type of group that each cluster represents can be qualitatively interpreted by looking at the centroid feature weights.
Data assignment: The centroid, or centre collection of features, creates and defines each cluster. The closest centroid for each data point is then determined using a distance function of choice.
Update of the centroids: Following the assignment of all data points, the centroids are recalculated by averaging all the data points assigned to that cluster.
Repetition: Until a certain stopping condition is satisfied, such as no changes are made to clusters, the total distance is minimized, or a maximum iteration threshold is achieved, this assignment and update process is repeated.
K means for Spatial Analysis
Geographical data can be divided into k distinct clusters using the iterative K-means clustering algorithm. This is done by repeatedly assigning each data point to the closest centroid, recalculating the centroids as the mean of the assigned points, and repeating these steps until the centroids stabilize. This allows for the identification and interpretation of spatial patterns, such as market segments, urban land use types, environmental zones, and public health hotspots, while taking into account variables like distance metrics, data scaling, and geographic constraints to guarantee insightful and useful information.
Because of its scalability, it can manage enormous volumes of spatial data and is therefore appropriate for a variety of applications at both local and global sizes. GIS experts can find hidden insights in spatial data by utilizing K-meansβ advantages, which will ultimately result in superior decision-making and outcomes for a variety of spatial analytic tasks.
It can be used for: –
- Development and Urban Planning
-Land Use Analysis: K-means assists city planners with resource allocation and zoning restrictions by classifying metropolitan areas according to land use types (residential, commercial, and industrial).
-Smart City Initiatives: K-means facilitates the development of smart city projects by improving infrastructure and services by grouping sensor data (from sensors measuring pollution or traffic, as example).
2. Disaster Management
Risk assessment: By identifying high-risk locations through K-means clustering of historical disaster data, disaster preparedness and mitigation planning are aided.
Resource Allocation: When responding to a disaster, grouping the impacted areas helps to prioritize the distribution of resources and rescue efforts.
3. Public health
illness Outbreak Detection: Public health professionals can identify regions with high illness incidence by clustering health data. This allows for focused treatments and effective resource distribution.
Healthcare Accessibility: By identifying underserved areas and examining the spatial distribution of healthcare services, K-means helps guide policy for improved healthcare access.
4. Real Estate
Property Valuation: Accurate property valuation and market analysis are aided by clustering property data according to features such as location, size, and amenities.
Development Planning: By using spatial clustering, real estate developers can pinpoint new trends and possible hotspots for development.
5. Transportation and Logistics
Route Optimization: By helping to cluster delivery points, K-means facilitates more effective routing and lowers transportation expenses.
Traffic Management: Cities can enhance traffic flow and better control congestion by clustering traffic data.
Snippet
Open your Google Earth engine
/ import the satellite data from the European Space Agency
var S2 = ee.ImageCollection("COPERNICUS/S2");
//filter for Dubai
S2 = S2.filterBounds(Dubai);
print(S2);
//filter for date
S2 = S2.filterDate("2020-01-01", "2020-05-11");
print(S2);
var image = ee.Image(S2.first());
print(image)
var image = ee.Image(S2.first());
print(image)
//Map.addLayer(image,{min:0,max:3000,bands:"B4,B3,B2"}, "Dubai");
Map.addLayer(image,{min:0,max:3000,bands:"B8,B4,B3"}, "Dubai");
// Create training dataset.
var training = image.sample({
region: Dubai,
scale: 20,
numPixels: 5000
});
// Start unsupervised clusterering algorithm and train it.
var kmeans = ee.Clusterer.wekaKMeans(5).train(training);
// Cluster the input using the trained clusterer.
var result = image.cluster(kmeans);
// Display the clusters with random colors.
Map.addLayer(result.randomVisualizer(), {}, 'Unsupervised K-means Classification');
// Export the image to Drive
Export.image.toDrive({
image: result,
description: 'kmeans_Dubai',
scale: 20,
region: Dubai
});
If you are enjoying this article please consider supporting my work and fuel my creativity by buying me a coffee, as Iβm not eligible for the Medium Partner Program but your contribution makes all the difference, any amount will do, Thanks.
Conclusion
K-means clustering has a significant impact on spatial analysis by providing a flexible and effective tool for finding patterns, maximizing resources, and making defensible decisions in a variety of contexts, including business strategy, public health, and environmental monitoring in addition to urban planning. It is a priceless tool in todayβs data-driven decision-making processes due to its efficiency in managing huge spatial datasets and delivering insightful analysis.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI