Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Take our 85+ lesson From Beginner to Advanced LLM Developer Certification: From choosing a project to deploying a working product this is the most comprehensive and practical LLM course out there!

Publication

Deep Insights Into K-nearest Neighbors
Machine Learning

Deep Insights Into K-nearest Neighbors

Last Updated on July 26, 2020 by Editorial Team

Author(s): Palak

Machine Learning

Photo by ZdenΔ›k MachÑček onΒ Unsplash

Introduction

K-nearest neighbors(in short KNNs) are a robust yet straightforward supervised machine learning algorithm. Though its regression variant also exists, we’ll only discuss its classification form.

KNN is a non-parametric lazy learning algorithm. Let’s have a look at what each term meansΒ β€”

  • Non-parametric: Unlike the parametric models, the non-parametric models are more flexible. They don’t have any preconceived notions about the number of parameters or the functional form of the hypothesis. They are thus saving us from the trouble of any wrong assumption about the data distribution. For example, in Linear Regression, which is a parametric algorithm, we already have a form of a hypothesis, and during the training phase, we look for its correct coefficient values.
  • Lazy learning: Also known as instance-based learning. Here the algorithm doesn’t explicitly learn a model, rather chooses to memorize the training set so to be used during the prediction phase. This results in the fast training phase, but expensive testing phase as it requires storing and traversing the wholeΒ dataset.

Intuitive Approach

The basic idea behind this algorithm isβ€Šβ€”β€ŠBirds of a feather flock together, i.e., similar things occur nearby, wondering why something like this even holds?
Well, try imagining a large number of data points in a D dimensional space. It’s not wrong to conclude the subspace will be densely filled, which implies two fairly close points are more likely to have the same label. Indeed there exists a tight error bound on this algorithm β€”

where P* is the Bayes error rate, which is the lowest possible error rate for any classifier, c is the number of classes, and P is the error rate of our algorithm. According to the above equation, if the number of points is fairly large, then the error rate of the algorithm is less than twice the Bayes error rate. Pretty cool!
Don’t get me wrong. By a large number of data points, I don’t mean a large number of features. Rather the algorithm is particularly susceptible to the curse of dimensionality, meaning an increase in the dimension(number of features) will lead to an increase in the volume of hyperspace required to accumulate a fixed number of neighbors. And as the volume increases, the neighbors become less and less similar because the distance between them increases. Therefore the probability of points having the same label decreases. In those cases, feature selection might comeΒ handy.

What does KNNΒ do?

The KNN algorithm takes a test input, compares it with every single training example, and predicts the target label as the label that is most often represented in the k most similar training examples. The similarity between two data points is usually measured using Euclidian distance.

Source

When k = 1, the target label is predicted as the label of the nearest training example.
Since we store and traverse the whole training set for every test example, the testing phase is costly in terms of both memory andΒ time.

What is k inΒ kNN?

K is the hyperparameter, whose value controls the learning process. It is the number of neighbors considered for predicting testing examples’ label.

Does the value of k affect our classifier?

Before addressing the above question, Let’s understand what we mean by overfitting and Underfitting.
In supervised learning, we want to build a model that learns from our training set and makes reasonably accurate predictions on unseen data. This ability of our model to learn enough to apply is called generalization.
Intuitively we expect simple models to generalize better. The complex model fits the peculiarities of the data, rather than summarising the underlying biology. They might perform accurately on training data but usually fails to generalize over testing data.
Overfitting occurs when a model fits closely to the peculiarities of the training set but is not able to generalize on new data.
Whereas Underfitting occurs when a model is too simple and fails even on the trainingΒ set.

source

Let’s have a look at what it’s like to consider one nearest neighbor and 20 nearest neighbors β€”

source

It can be observed from the above image, k = 1 leads to jagged boundaries, while k = 20 gives us a smooth boundary.
Smooth boundaries correspond to a simple model, whereas the jagged boundaries correspond to a complex model. Therefore if we keep the value of k low, we risk ourselves of overfitting, while if we keep the value of k high, we risk ourselves of Underfitting.
So an obvious solution is to keep k such that it’s not too small and not too large. Maybe try evaluating with different values of k and pick what works best.
Well, this is what we’ll do, but a tadΒ smartly.

Validation for Hyperparameter Tuning

As decided earlier, we will try out different values of k but not on the testing set. The testing set should not be used until one time at the very end. Using it before causes us of the trouble of overfitting the testing set and thus unable to generalize on unseen data.
A way around this is to hold out a subset of the training set for tuning the hyperparameter. This set is called a validation set.
In case when the training set is small, a more sophisticated technique is cross-validation. Here we split the training set into x groups, say 5. We’ll use 4 of them for training and one for validation. With each iteration, we use another group as a validation set. Finally, the average performance is used for figuring out the best value ofΒ k.

Source: A training and test set is given. The training set is split into folds (e.g., five-folds here). The folds 1–4 become the training set. One fold (e.g., fold five here in yellow) is denoted as the Validation fold and is used to tune the hyperparameters. Cross-validation goes a step further and iterates over the choice of which fold is the validation fold, separately from 1–5. This would be referred to as 5-fold cross-validation. In the very end, once the model is trained and all the best hyperparameters were determined, the model is evaluated a single time on the test dataΒ (red).
source

In the above image, k = 7 results in the lowest validation error.

Advantages ofΒ KNNs

  • KNNs are simple to implement and understand.
  • None or very little training time required.
  • It works fine with a multiclass dataset.
  • KNNs are non-parametric, so it works fine with highly unusual data, as no assumption about the functional form is required.

Disadvantages ofΒ KNNs

  • It has a computationally, expensive testingΒ phase.
  • KNN suffers from skewed class distribution. A more frequent class tends to dominate over the votingΒ process.
  • KNN also suffers from the curse of dimensionality.

I have tried to give a deep insight into the K-nearest neighbor algorithm.
I hope everything makesΒ sense.

Thanks forΒ reading!

References:


Deep Insights Into K-nearest Neighbors was originally published in Towards AIβ€Šβ€”β€ŠMultidisciplinary Science Journal on Medium, where people are continuing the conversation by highlighting and responding to this story.

Published via Towards AI

Feedback ↓