Grouping Classes Using K-Nearest Neighbors Algorithm — Python

Last Updated on July 20, 2023 by Editorial Team

A guide to knowing and implementing the KNN algorithm.

What is the KNN algorithm?

It is an algorithm used for classification tasks and works on a very simple principle.

How does it work?

The KNN algorithm is very basic. The training algorithm stores all the data. And the predicting algorithm calculates the distance of a data point to all points in the data, sorts the points in the increasing order of distance from the data point and then predicts the majority label of the ‘k’ closest points.

What are the advantages of this algorithm?

It is very simple and easy to understand and implement.
It used only 2 parameters: k and distance metric.
It can classify any number of classes.
The training step is very easy to implement and more data can be added at any stage.

What are the disadvantages of this algorithm?

It works well only with numerical data. With categorical data, it might not perform well.
The cost of prediction is very high.
It does not do well with high-dimensional data.

How to implement KNN using Python?

An artificial dataset is used to perform the classification. There are two classes 0 and 1. The goal is to classify the data into two different classes.

→ Import packages

The libraries to help deal with data — pandas and numpy along with data visualization packages — matplotlib and seaborn are imported.

>>> import pandas as pd
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> import seaborn as sns
>>> %matplotlib inline

→ Read data

The data has a target class that takes the value 0 or 1. The other columns are numeric but there is no meaning to it since the data is artificial.

>>> df = pd.read_csv('datasets/dataset.csv')
>>> df.head()

→ Standardize data

In KNN, it is important to standardize the variables. The reason being that KNN classifies a test observation by identifying the observations that are nearest to it and if there are any variables with a large scale then it will have a higher effect on the distance between observations. It is done using the sci-kit learn package. The standard scalar function is imported and an instance of it is created.

>>> from sklearn.preprocessing import StandardScaler
>>> scaler = StandardScaler()

Next, fit the scaler object to the features (data without target class) and using the transform() method the features are transformed to a scaled version.

>>> scaler.fit(df.drop('TARGET CLASS',axis=1))
>>> scaled_features = scaler.transform(df.drop('TARGET CLASS',axis=1))

The scaled features can be viewed in the form of a dataframe which will be used for model building.

>>> df_final = pd.DataFrame(scaled_features,columns=df.columns[:-1])
>>> df_final.head()

→ Splitting the data into training and testing data

The train test split function will be used from the scikit learn package.

>>> from sklearn.model_selection import train_test_split
>>> X_train, X_test, y_train, y_test = train_test_split(scaled_features,df['TARGET CLASS'],test_size=0.30)

Choosing the K value

Using the elbow method, the value of K will be chosen. So K values will be looped and for each value, the error rate will be calculated.

>>> from sklearn.neighbors import KNeighborsClassifier>>> error_rate = []>>> for i in range(1,40):
 knn = KNeighborsClassifier(n_neighbors=i)
 knn.fit(X_train,y_train)
 pred_i = knn.predict(X_test)
 error_rate.append(np.mean(pred_i != y_test))>>> plt.figure(figsize=(10,6))
>>> plt.plot(range(1,40),error_rate,marker='o',
 markerfacecolor='red')
>>> plt.title('Error Rate v/s K Value')
>>> plt.xlabel('K')
>>> plt.ylabel('Error Rate')

Using the above graph, the K value can be considered as 30.

→ Model building

The model is built by using the KNeighborsClassifier method imported from sci-kit learn. An object of the KNN function will be created and the number of neighbors which is equal to the K value obtained above will be mentioned. Then the object is fit on the training data.

>>> knn = KNeighborsClassifier(n_neighbors=30)
>>> knn.fit(X_train,y_train)KNeighborsClassifier(n_neighbors=30)

→ Prediction

>>> pred = knn.predict(X_test)

→ Evaluations

>>> from sklearn.metrics import classification_report,confusion_matrix
>>> print(confusion_matrix(y_test,pred))[[133 34]
 [ 16 117]]>>> print(classification_report(y_test,pred))
precision recall f1-score support 0 0.89 0.80 0.84 167
 1 0.77 0.88 0.82 133 accuracy 0.83 300
 macro avg 0.83 0.84 0.83 300
weighted avg 0.84 0.83 0.83 300

Refer to the dataset and notebook here.

Beginner-level machine learning books to refer to:

Python Machine Learning: A Beginner’s Guide to Python Programming for Machine Learning and Deep…

The Hundred-Page Machine Learning Book

Advance-level machine learning books to refer to:

Hands-On Machine Learning with Scikit-Learn, Keras and Tensor Flow: Concepts, Tools and Techniques…

Pattern Recognition and Machine Learning (Information Science and Statistics)

Reach out to me: LinkedIn

Check out my other work: GitHub

Join thousands of data leaders on the AI newsletter. Join over 80,000 subscribers and keep up to date with the latest developments in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming a sponsor.

Published via Towards AI

Frequently Used, Contextual References

Resources

Publication

Grouping Classes Using K-Nearest Neighbors Algorithm — Python

Author(s): Jayashree domala

Machine Learning

A guide to knowing and implementing the KNN algorithm.

What is the KNN algorithm?

How does it work?

What are the advantages of this algorithm?

What are the disadvantages of this algorithm?

How to implement KNN using Python?

Choosing the K value

Beginner-level machine learning books to refer to:

Python Machine Learning: A Beginner’s Guide to Python Programming for Machine Learning and Deep…

The Hundred-Page Machine Learning Book

Advance-level machine learning books to refer to:

Hands-On Machine Learning with Scikit-Learn, Keras and Tensor Flow: Concepts, Tools and Techniques…

Pattern Recognition and Machine Learning (Information Science and Statistics)

Feedback ↓ Cancel reply

Popular posts

Best Laptops for Deep Learning, Machine Learning (ML), and Data Science for 2023

Best Workstations for Deep Learning, Data Science, and Machine Learning (ML) for 2022

Descriptive Statistics for Data-driven Decision Making with Python

Best Machine Learning (ML) Books - Free and Paid - Editorial Recommendations for 2022

Best Data Science Books - Free and Paid - Editorial Recommendations for 2022

Updates

Recent Posts

The Fundamental Mathematics of Machine Learning

Built-In AI Web APIs Will Enable A New Generation Of AI Startups

Auditing Predictive A.I. Models for Bias and Fairness

Why is Llama 3.1 Such a Big deal?

5 AI Real-World Projects To Set Foot in The Door

The World’s Leading AI and Technology Publication.

Company

CONTACT US

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Frequently Used, Contextual References

Resources

Publication

Grouping Classes Using K-Nearest Neighbors Algorithm — Python

Author(s): Jayashree domala

A guide to knowing and implementing the KNN algorithm.

What is the KNN algorithm?

How does it work?

What are the advantages of this algorithm?

What are the disadvantages of this algorithm?

How to implement KNN using Python?

Choosing the K value

Beginner-level machine learning books to refer to:

Python Machine Learning: A Beginner’s Guide to Python Programming for Machine Learning and Deep…

The Hundred-Page Machine Learning Book

Advance-level machine learning books to refer to:

Hands-On Machine Learning with Scikit-Learn, Keras and Tensor Flow: Concepts, Tools and Techniques…

Pattern Recognition and Machine Learning (Information Science and Statistics)

Related posts

Feedback ↓ Cancel reply

Popular posts

Updates

Recent Posts

The World’s Leading AI and Technology Publication.

Company

CONTACT US

GDPR CCPA Statement