SUPPORT VECTOR MACHINES
Last Updated on December 10, 2022 by Editorial Team
Author(s): Data Science meets Cyber Security
Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.
SUPERVISED LEARNING METHODS:Β PART-3
SUPPORT VECTOR MACHINES:
Letβs first comprehend the example in order to get off to a better start before moving on to theΒ term.
Consider the following scenario: An online beauty firm wishes to develop a classification engine to analyze the products with the best sales across the five categories of SPA, skincare, hair care, makeup, and bodyΒ care.
NOTE: In machine learning, whenever we work with supervised learning algorithms, we always deal with the matrix (total no. of rows and columns) and target i.e (categories), however in unsupervised learning, the target is not available, so we simply work with theΒ matrix.
WHAT OUGHT TO BE DONE IN SUCH CIRCUMSTANCES?
Our initial thought is to utilize the classification method to group items to understand, identify, and classify the concepts and things into predetermined groups which is basically the process of classification.
The first thing weβll do is set the target vector, which is a multi-class classification. This involves deciding which products belong in which categories first and categorizing them accordingly. This means we have a lot of rows and columns, which increases the risk of overfitting. In situations like this, we need an algorithm that can help us accomplish our goal while also lowering the risk of overfitting. The SUPPORT VECTOR MACHINE algorithm enters the picture at thisΒ point.
An algorithm called SUPPORT VECTOR MACHINES can handle many columns and many predictors for a very modest number ofΒ trades.
SUPPORT VECTOR MACHINE is divided into two categories:
- Linear support vectorΒ machines
- Non-linear support vectorΒ machines
1. LINEAR SUPPORT VECTOR MACHINES:
When a dataset can be divided into two classes by means of a single straight line, it is said to be linearly separable, and the classifier utilized is known as a Linear SVM classifier.
Letβs start with the algorithmβs fundamental building block, maths, in order to grasp it from the groundΒ up:
THE DOTΒ PRODUCT
Everyone would naturally be an expert at this as we all learned how to compute the dot product in our earlyΒ years.
The dot product is also known as the projection product. Why? LetβsΒ see
THIS AMAZING ARTICLE WHICH I FOUND ON THE INTERNET IS THE BEST EXAMPLE OF THIS QUESTION:
The projection of βaβ onto βbβ is the dot product of the vectors βaβ (in blue) and βbβ (in green), divided by the magnitude of βbβ. The red line segment from the tail of βbβ to the projection of the head of βaβ on βbβ serves as an example of this projection.
THE DOT PRODUCT AND THE HYPERPLANE:
WHAT EXACTLY A HYPERPLANE IN SVMΒ MEANS?
In support vector machines, the hyperplane is a decision boundary that separates the two classes. The various classes may be identified by a data point that lies on each side of the hyperplane, as seen in the linear support vector machine graph above. The amount of features in the datasets determines the hyperplaneβs dimension.
The most basic equation of the plane.Β I.e
ALTERNATIVE SPECIFICATION
- Specifying a point and a vector perpendicular.
- Let P & P0 be two points on a hyperplane.
- Let x & x0 be two vectors supporting the hyperplane.
Consider the vector βwβ which is orthogonal to the hyperplane atΒ x0
- (x- x0) must lie on the hyperplane β w must be orthogonal to (x-Β x0)
- w(transpose)(x-x0) =Β 0
- w(transpose)x = -w(transpose)x0
- w(transpose)x =Β b
LINEAR CLASSIFIERS:
The job of distinguishing classes in feature space can be seen as binary classification:
MARGIN OF CLASSIFIER IN ORDER TO CLASSIFYΒ DATA?
The margin is the separation between the line and the nearest data points. The line with the biggest margin is the best or ideal line that can divide the twoΒ classes.
MAXIMUM MARGINΒ :
The linear classifier with the greatest margin is known as the largest margin linear classifier.
This is the simplest kind of SUPPORT VECTOR MACHINE, also known as the LINEAR SUPPORT VECTORΒ MACHINE.
Now that weβre talking about real-world scenarios, we can see that the data points in these situations are frequently not linearly separable and also prone to noise and outliers, so we cannot actually classify this data using the previous formulations.
WHY ARE HYPERPLANE AND MARGIN IMPORTANT CONCEPTS INΒ SVM?
The margin is a term that may be used to describe the separation between the line and the nearest data point. The line with the widest margin is the best or ideal line that may be used to divide the two classes. We referred to this as the MAXIMAL MARGIN HYPERPLANE.
HOW TO FIX THE FORMULATION TO HANDLE THE NOISE AND OUTLIERS:
THE FORMULA USED TO FIX THE NOISE ANDΒ OUTLIER:
Now moving forward the only thing that might possibly trouble a human intellect when analyzing the given figure is what that βCβ meansΒ ? and why did we chooseΒ that?
Okay, then letβs getΒ started.
The βCβ here stands for the SUPPORT VECTOR PARAMETER –
To conveniently classify fresh data points in the future, the SVM algorithm seeks to identify the optimal line or decision boundary for classifying n-dimensional space. The optimal choice boundary is a hyperplane.
The higher dimensional space is created using SVM by selecting the nearby remarkable points. The approach is known as the Support Vector Machine, and these extreme situations are known as supportΒ vectors.
SVM PARAMETERβββC:
- Controls trainingΒ error.
- It is used to prevent overfitting.
- Letβs play with C. (because why notΒ π)
2. NON-LINEAR SUPPORT VECTOR MACHINES:
Non-Linear Classification is the process of classifying situations that cannot be separated linearly. We divide data points using a high-dimensional classifier in nonlinear SVM.
Letβs take a look at the most famous trick used in non-linear SVM
THE KERNELΒ TRICK:
A kernel is a technique for introducing a two-dimensional plane into a higher-dimensional environment, curving it there. A kernel is a function that transforms a low-dimensional space into a higher-dimensional one.
WHY KERNELΒ TRICK?
SUPPORT VECTOR MACHINE has trouble categorizing non-linear data. The use of the Kernel Trick is the simple fix in this situation. A kernel trick is a straightforward technique that involves projecting nonlinear data into a higher dimension space where it may be linearly split by a plane, making it easier to categorize the data. The Lagrangian formula and Lagrangian multipliers are used to mathematically achieve this. (More information is provided in the maths part that follows.)
SUMMING UP THEΒ STEPS:
- Pre-process (scaling, numerical mapping, etc.) trainingΒ data.
- Pick up Kernel Trick to fix the trouble caused while categorizing the non-linear data.
- Use Cross-Validation to find the best C and Ο parameter values.
- Use the best C, Ο to train on the entire trainingΒ set.
- Test.
FEW POINTS TO REMEMBER (IMP ONES)Β :
- Support Vector Machines (SVMs) work very well in practice for a large class of classification problems.
- SVMs work on the principle of learning a maximum margin hyperplane which results in good generalization.
- The basic linear SVM formulation could be extended to handle noisy and non-separable data.
- The Kernel Trick could be used to learn complex non-linear patterns.
- For better performance, one has to tune the SVM parameters such as C, and kernel parameters using a validation set.
I hope this blog makes the math and primary idea behind the support vector machine obvious.π
STAY TUNED FOR THE NEXT BLOG, WHERE WEβLL DISCUSS HOW TO IMPLEMENT THE SVM IN PYTHON THROUGH AN AMAZING CASEΒ STUDY.β€οΈ
FOLLOW US FOR THE SAME FUN TO LEARN DATA SCIENCE BLOGS AND ARTICLES:π
LINKEDIN: https://www.linkedin.com/company/dsmcs/
INSTAGRAM: https://www.instagram.com/datasciencemeetscybersecurity/?hl=en
GITHUB: https://github.com/Vidhi1290
TWITTER: https://twitter.com/VidhiWaghela
MEDIUM: https://medium.com/@datasciencemeetscybersecurity-
WEBSITE: https://www.datasciencemeetscybersecurity.com/
β Team Data Science meets Cyber SecurityΒ β€οΈπ
SUPPORT VECTOR MACHINES was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.
Join thousands of data leaders on the AI newsletter. Itβs free, we donβt spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.
Published via Towards AI