How Neighborly is K-Nearest Neighbors to GIS Pros?
Last Updated on April 11, 2024 by Editorial Team
Author(s): Stephen Chege-Tierra Insights
Originally published on Towards AI.
At one point in your life I am sure you have interacted with a nice neighbor, you know, the one who would greet you on your way to work or school, ask how your day was, help you carry groceries to your house or if you are lucky they would bring baked pie to your home. We also have the opposite of a lovely neighbor, a nosey inconsiderate Neighbor you dread living next to. In other words, neighbors play a major part in our life.
Now, in the realm of geographic information systems (GIS), professionals often experience a complex interplay of emotions akin to the love-hate relationship one might have with neighbors. Enter K Nearest Neighbor (k-NN), a technique that personifies the very essence of propinquity and Neighborly dynamics.
As GIS experts navigate through the spatial landscape, they grapple with the intricacies of k-NN, sometimes embracing its insights with open arms, while at other times feeling the frustration of its limitations. Let us look at how the K Nearest Neighbor algorithm can be applied to geospatial analysis.
What is K Nearest Neighbor?
A non-parametric, supervised learning classifier, the K-Nearest Neighbors (k-NN) algorithm uses proximity to classify or predict how a single data point will be grouped. It is among the most widely used and straightforward regression and classification classifiers in machine learning today.
Esri defines K-Nearest Neighbor classifier as a nonparametric classification method that classifies a pixel or segment by a plurality vote of its Neighbor. K is the defined number of Neighbors used in voting.
K-Nearest Neighbors (k-NN) is like asking your neighbors for information, you look at what your closest neighbors are doing to decide what to do next. To categorize a place on a map, for instance, by figuring out if itβs a city or a forest, you look at the spots that are closest to you and identify what they are. If the majority of them are woodlands, you could assume that the new site is likewise a forest.
Evelyn Fix and Joseph Hodges created k-NN for the first time in 1951 while conducting research for the US military. They released a paper outlining the non-parametric classification technique known as discriminant analysis. Thomas Cover and Peter Hart published their βNearest Neighbor Pattern Classificationβ paper in 1967, which further developed the non-parametric classification technique. Today, the k NN algorithm is the most widely used algorithm due to its adaptability to most fields, from genetics, finance, environmental analysis and customer service.
How can it Be Applied to Geospatial Analysis?
In geospatial analysis, satellite images can be categorized into many groups using k-NN according to attributes like colour, texture, and shape. One way to train a k-NN algorithm is to use a dataset of satellite photos that have been classified as cities or forests. Based on how closely new photographs resemble the training set, the system may be trained to distinguish between forests and cities.
k-NN can be applied to geographic clustering, which is the process of grouping comparable features according to their attribute similarity and physical proximity. This is useful for market segmentation, which divides consumers and companies into clusters according to factors like demography and proximity to one another.
k-NN works especially well with Google Earth Engine, R studios and Python for geospatial analysis. Using Google Earth Engine can assist in detecting deforestation in Kenya, charcoal burning in Somalia, soil erosion, forestation and Ocean pollution. Researchers and environmentalists can use geographical proximity to detect and track environmental changes by integrating KNN algorithms into these platforms. This will help to make better decisions and implement sustainable management techniques.
Benefits of k-NN for GIS
1. Easy to understand– k-NN is user-friendly to GIS pros as it works on the concept of proximity or nearness, making it easy to understand and easy to implement, especially when it comes to crucial assignments such as spatial analysis.
2. Expandability– Spatial datasets that are frequently encountered in GIS applications can be efficiently analyzed thanks to k-NNβs expandability. KNN can easily handle increasingly big and complicated spatial datasets thanks to advancements in parallel processing techniques and computer resources.
3. Accessible to GIS Platforms– The K Nearest Neighbors is accessible to GIS platform libraries such as R studios, Google Earth Engine, and Python which are open sources. k-NN can be easily integrated into a GIS workflow through these libraries for effective analysis.
4. No Training Period– k-NN modeling does not include a training period as the data itself is a model that will be the reference for future prediction, and because of this, it is very time efficient in terms of improvising for random modeling on the available data.
k-NN for Python code sample
# Importing necessary libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load the Iris dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Instantiate the K-NN classifier
knn = KNeighborsClassifier(n_neighbors=5)
# Train the K-NN classifier
knn.fit(X_train, y_train)
# Predict the classes for test set
y_pred = knn.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
This is a simple example of using the scikit-learn module to create the K-Nearest Neighbors classification in Python. Before executing this script, make sure scikit-learn is installed (pip install scikit-learn).
How to get started
1. Decide which platform is best for you- As mentioned earlier, K Nearest Neighbors is available on Python, R studios and Google Earth Engine as a library for GIS, It is upon you to decide which platform suits you best and utilize it.
2. Make use of documentation and tutorials– These platforms that k-NN is available on also have a vast amount of resources that can help you implement this algorithm with ease. This will be very helpful when it comes to debugging.
3. Understand the concepts– Find out what each line of code does as you broaden your understanding. Recognize how the algorithm uses the K-Nearest Neighbors average or majority vote to identify the class or value of a new data point.
4. Start the process– select your distance metric, split the data, set the value of the K, train your k-NN model, make a prediction, evaluate your performance, fine-tune the perimeters and validate the algorithm.
5. Practice– K Nearest Neighbors needs a lot of practice like trying to master any other algorithm, make sure you practice, especially if you are not conversant with machine learning concepts, this will help you stay ahead of the learning curve.
Conclusion
The future of K Nearest Neighbor looks very bright, largely with the AI boom in full force, the geospatial sector will benefit greatly from K Near neighbors . Alternatively, it could fade away and be replaced with a more effective algorithm that is best suited for the future. Only time will tell, (pun intended). K Nearest Neighborβs ability to handle new problems and satisfy shifting industry demands will govern its future applicability and implication in geospatial analysis as technology develops.
Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI