Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Read by thought-leaders and decision-makers around the world. Phone Number: +1-650-246-9381 Email: [email protected]
228 Park Avenue South New York, NY 10003 United States
Website: Publisher: https://towardsai.net/#publisher Diversity Policy: https://towardsai.net/about Ethics Policy: https://towardsai.net/about Masthead: https://towardsai.net/about
Name: Towards AI Legal Name: Towards AI, Inc. Description: Towards AI is the world's leading artificial intelligence (AI) and technology publication. Founders: Roberto Iriondo, , Job Title: Co-founder and Advisor Works for: Towards AI, Inc. Follow Roberto: X, LinkedIn, GitHub, Google Scholar, Towards AI Profile, Medium, ML@CMU, FreeCodeCamp, Crunchbase, Bloomberg, Roberto Iriondo, Generative AI Lab, Generative AI Lab Denis Piffaretti, Job Title: Co-founder Works for: Towards AI, Inc. Louie Peters, Job Title: Co-founder Works for: Towards AI, Inc. Louis-François Bouchard, Job Title: Co-founder Works for: Towards AI, Inc. Cover:
Towards AI Cover
Logo:
Towards AI Logo
Areas Served: Worldwide Alternate Name: Towards AI, Inc. Alternate Name: Towards AI Co. Alternate Name: towards ai Alternate Name: towardsai Alternate Name: towards.ai Alternate Name: tai Alternate Name: toward ai Alternate Name: toward.ai Alternate Name: Towards AI, Inc. Alternate Name: towardsai.net Alternate Name: pub.towardsai.net
5 stars – based on 497 reviews

Frequently Used, Contextual References

TODO: Remember to copy unique IDs whenever it needs used. i.e., URL: 304b2e42315e

Resources

Unlock the full potential of AI with Building LLMs for Productionβ€”our 470+ page guide to mastering LLMs with practical projects and expert insights!

Publication

SUPPORT VECTOR MACHINES
Latest

SUPPORT VECTOR MACHINES

Last Updated on December 10, 2022 by Editorial Team

Author(s): Data Science meets Cyber Security

Originally published on Towards AI the World’s Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses.

SUPERVISED LEARNING METHODS:Β PART-3

SUPPORT VECTOR MACHINES:

SOURCE: IMAGE BYΒ AUTHOR

Let’s first comprehend the example in order to get off to a better start before moving on to theΒ term.

Consider the following scenario: An online beauty firm wishes to develop a classification engine to analyze the products with the best sales across the five categories of SPA, skincare, hair care, makeup, and bodyΒ care.

NOTE: In machine learning, whenever we work with supervised learning algorithms, we always deal with the matrix (total no. of rows and columns) and target i.e (categories), however in unsupervised learning, the target is not available, so we simply work with theΒ matrix.

WHAT OUGHT TO BE DONE IN SUCH CIRCUMSTANCES?

SOURCE: https://giphy.com/

Our initial thought is to utilize the classification method to group items to understand, identify, and classify the concepts and things into predetermined groups which is basically the process of classification.

The first thing we’ll do is set the target vector, which is a multi-class classification. This involves deciding which products belong in which categories first and categorizing them accordingly. This means we have a lot of rows and columns, which increases the risk of overfitting. In situations like this, we need an algorithm that can help us accomplish our goal while also lowering the risk of overfitting. The SUPPORT VECTOR MACHINE algorithm enters the picture at thisΒ point.

An algorithm called SUPPORT VECTOR MACHINES can handle many columns and many predictors for a very modest number ofΒ trades.

SUPPORT VECTOR MACHINE is divided into two categories:

  1. Linear support vectorΒ machines
  2. Non-linear support vectorΒ machines

1. LINEAR SUPPORT VECTOR MACHINES:

SOURCE: IMAGE BYΒ AUTHOR

When a dataset can be divided into two classes by means of a single straight line, it is said to be linearly separable, and the classifier utilized is known as a Linear SVM classifier.

Let’s start with the algorithm’s fundamental building block, maths, in order to grasp it from the groundΒ up:

SOURCE: https://giphy.com/

THE DOTΒ PRODUCT

Everyone would naturally be an expert at this as we all learned how to compute the dot product in our earlyΒ years.

The dot product is also known as the projection product. Why? Let’sΒ see

THIS AMAZING ARTICLE WHICH I FOUND ON THE INTERNET IS THE BEST EXAMPLE OF THIS QUESTION:

SOURCE: https://mathinsight.org/applet/dot_product_projection
SOURCE: https://mathinsight.org/applet/dot_product_projection

The projection of β€˜a’ onto β€˜b’ is the dot product of the vectors β€˜a’ (in blue) and β€˜b’ (in green), divided by the magnitude of β€˜b’. The red line segment from the tail of β€˜b’ to the projection of the head of β€˜a’ on β€˜b’ serves as an example of this projection.

THE DOT PRODUCT AND THE HYPERPLANE:

SOURCE: https://ccc.inaoep.mx/~emorales/Cursos/NvoAprend/Acetatos/svm2019.pdf

WHAT EXACTLY A HYPERPLANE IN SVMΒ MEANS?

In support vector machines, the hyperplane is a decision boundary that separates the two classes. The various classes may be identified by a data point that lies on each side of the hyperplane, as seen in the linear support vector machine graph above. The amount of features in the datasets determines the hyperplane’s dimension.

The most basic equation of the plane.Β I.e

SOURCE: BYΒ AUTHOR

ALTERNATIVE SPECIFICATION

  • Specifying a point and a vector perpendicular.
  • Let P & P0 be two points on a hyperplane.
  • Let x & x0 be two vectors supporting the hyperplane.

Consider the vector β€˜w’ which is orthogonal to the hyperplane atΒ x0

  • (x- x0) must lie on the hyperplane β†’ w must be orthogonal to (x-Β x0)
  • w(transpose)(x-x0) =Β 0
  • w(transpose)x = -w(transpose)x0
  • w(transpose)x =Β b

LINEAR CLASSIFIERS:

The job of distinguishing classes in feature space can be seen as binary classification:

SOURCE: IMAGE BYΒ AUTHOR

MARGIN OF CLASSIFIER IN ORDER TO CLASSIFYΒ DATA?

SOURCE: IMAGE BYΒ AUTHOR

The margin is the separation between the line and the nearest data points. The line with the biggest margin is the best or ideal line that can divide the twoΒ classes.

MAXIMUM MARGINΒ :

SOURCE: IMAGE BYΒ AUTHOR

The linear classifier with the greatest margin is known as the largest margin linear classifier.

This is the simplest kind of SUPPORT VECTOR MACHINE, also known as the LINEAR SUPPORT VECTORΒ MACHINE.

Now that we’re talking about real-world scenarios, we can see that the data points in these situations are frequently not linearly separable and also prone to noise and outliers, so we cannot actually classify this data using the previous formulations.

WHY ARE HYPERPLANE AND MARGIN IMPORTANT CONCEPTS INΒ SVM?

The margin is a term that may be used to describe the separation between the line and the nearest data point. The line with the widest margin is the best or ideal line that may be used to divide the two classes. We referred to this as the MAXIMAL MARGIN HYPERPLANE.

HOW TO FIX THE FORMULATION TO HANDLE THE NOISE AND OUTLIERS:

SOURCE: IMAGE BYΒ AUTHOR

THE FORMULA USED TO FIX THE NOISE ANDΒ OUTLIER:

SOURCE: IMAGE BYΒ AUTHOR

Now moving forward the only thing that might possibly trouble a human intellect when analyzing the given figure is what that β€œC” meansΒ ? and why did we chooseΒ that?

Okay, then let’s getΒ started.

The β€˜C’ here stands for the SUPPORT VECTOR PARAMETER –

To conveniently classify fresh data points in the future, the SVM algorithm seeks to identify the optimal line or decision boundary for classifying n-dimensional space. The optimal choice boundary is a hyperplane.

The higher dimensional space is created using SVM by selecting the nearby remarkable points. The approach is known as the Support Vector Machine, and these extreme situations are known as supportΒ vectors.

SVM PARAMETERβ€Šβ€”β€ŠC:

  1. Controls trainingΒ error.
  2. It is used to prevent overfitting.
  3. Let’s play with C. (because why not 😁)
SOURCE: https://www.youtube.com/watch?v=5oVQBF_p6kY + PHOTOSHOPPED

2. NON-LINEAR SUPPORT VECTOR MACHINES:

SOURCE: IMAGE BYΒ AUTHOR

Non-Linear Classification is the process of classifying situations that cannot be separated linearly. We divide data points using a high-dimensional classifier in nonlinear SVM.

Let’s take a look at the most famous trick used in non-linear SVM

THE KERNELΒ TRICK:

A kernel is a technique for introducing a two-dimensional plane into a higher-dimensional environment, curving it there. A kernel is a function that transforms a low-dimensional space into a higher-dimensional one.

SOURCE: https://www.mdpi.com/2076-3417/11/10/4657/htm#B49-applsci-11-04657 (Original and kernelized featureΒ space)

WHY KERNELΒ TRICK?

SUPPORT VECTOR MACHINE has trouble categorizing non-linear data. The use of the Kernel Trick is the simple fix in this situation. A kernel trick is a straightforward technique that involves projecting nonlinear data into a higher dimension space where it may be linearly split by a plane, making it easier to categorize the data. The Lagrangian formula and Lagrangian multipliers are used to mathematically achieve this. (More information is provided in the maths part that follows.)

SUMMING UP THEΒ STEPS:

SOURCE: https://giphy.com/
  1. Pre-process (scaling, numerical mapping, etc.) trainingΒ data.
  2. Pick up Kernel Trick to fix the trouble caused while categorizing the non-linear data.
  3. Use Cross-Validation to find the best C and Οƒ parameter values.
  4. Use the best C, Οƒ to train on the entire trainingΒ set.
  5. Test.

FEW POINTS TO REMEMBER (IMP ONES)Β :

SOURCE: https://giphy.com/
  1. Support Vector Machines (SVMs) work very well in practice for a large class of classification problems.
  2. SVMs work on the principle of learning a maximum margin hyperplane which results in good generalization.
  3. The basic linear SVM formulation could be extended to handle noisy and non-separable data.
  4. The Kernel Trick could be used to learn complex non-linear patterns.
  5. For better performance, one has to tune the SVM parameters such as C, and kernel parameters using a validation set.

I hope this blog makes the math and primary idea behind the support vector machine obvious.😎

STAY TUNED FOR THE NEXT BLOG, WHERE WE’LL DISCUSS HOW TO IMPLEMENT THE SVM IN PYTHON THROUGH AN AMAZING CASEΒ STUDY.❀️

FOLLOW US FOR THE SAME FUN TO LEARN DATA SCIENCE BLOGS AND ARTICLES:πŸ’™

LINKEDIN: https://www.linkedin.com/company/dsmcs/

INSTAGRAM: https://www.instagram.com/datasciencemeetscybersecurity/?hl=en

GITHUB: https://github.com/Vidhi1290

TWITTER: https://twitter.com/VidhiWaghela

MEDIUM: https://medium.com/@datasciencemeetscybersecurity-

WEBSITE: https://www.datasciencemeetscybersecurity.com/

β€” Team Data Science meets Cyber SecurityΒ β€οΈπŸ’™


SUPPORT VECTOR MACHINES was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Join thousands of data leaders on the AI newsletter. It’s free, we don’t spam, and we never share your email address. Keep up to date with the latest work in AI. From research to projects and ideas. If you are building an AI startup, an AI-related product, or a service, we invite you to consider becoming aΒ sponsor.

Published via Towards AI

Feedback ↓