An intuitive guide to understanding Support Vector Machines (SVM)

Support Vector Machine (SVM) is undoubtedly one of the most popular ML algorithms used by machine learning practitioners. It is a supervised machine learning algorithm that is robust to outliers and generalizes well in many cases. However, the intuitive idea behind SVM can be a bit tricky to understand for a beginner. The name in itself is quite intimidating, Support, Vector, and Machine. I will try to frame a clear picture of the idea behind SVM in this article with the help of a simple example.

“What is Support, Vector, and a Machine?”

SVM finds its applications in both Regression and Classification tasks. However, we will only discuss classification tasks in this article.

Real-life tasks of SVM are:

  1. Email Spam Classification: SVM is used to do the binary classification of an email as Spam or not
  2. Loan Defaulter Detection: SVMs can be used to identify customers that can be potential defaulters
  3. Bioinformatics: We use SVM for identifying the classification of genes, patients based on genes, and other biological problems. Another use case is Protein Structure Prediction.
  4. Image Classification: SVMs can be used to classify images. Before the advent of deep learning, SVMs were standard norms in the Image classification domain.

Intuition:

Imagine countries A & B sharing their border and having a truce. This border separates citizens of country A from that of country B. The border is kept vigilante by the armed forces of the two countries. They make sure that there is no violations/movement of citizens across the border. The countries also share a disputed piece of land that is under nobody’s territory and hence generally called a No Man’s Land. Pictorially, this scenario looks like this:

In the above diagram, notice how only the soldiers are needed to keep the border (solid line). In other words, only these soldiers are required to support the border shape. Notice that all the citizens of a given country can roam freely within the country’s territories. But even if a single soldier backs away, there are chances that the enemy can gain some ground and the border is moved. This situation is discussed in the below figure.

A few things to note here:

  1. The border separates citizens of one country from that of another.
  2. Only the soldiers are required to keep the border.
  3. There is an empty land around the border that no one claims.
  4. Soldiers are closest to the border and other citizens are far away.

Now imagine the same situation as above but consider

  • Country A as a set of loan defaulters & Country B as a set of usual customers
  • Citizens as Data samples (customers)
  • Soldiers as supporting/necessary data samples
  • The border as Decision Boundary (Separator)

If we imagine the above scenario, then we have reached a way to separate our data samples by using only a minimal subset (Soldiers) of the whole data. This idea is the fundamental essence of classification using the very Support Vector Machine.

In SVM terminology, this border is a Separating Hyperplane; the citizens are data points; the soldiers are Support Vector; the distance between soldiers and the border is the Margin and, the whole setup is a Support Vector Machine Classifier. This idea is assimilated in the below diagram.

The goal of training an SVM model

Please note that the goal of SVM learning is to maximize the Margin by making the least or no error if possible. In other words, we try to make sure that none or the least numbers of citizens are trapped in the No Man’s Land while maximizing the distance between the soldiers and the border.  This is a complex optimization process that needs an understanding of higher mathematics and linear algebra. We will skip this part in this article.

Wow! Now you understand what Support Vector Machines and their different aspects are. Let us, deep-dive, into understanding some of its useful and necessary nuances.

Linear vs Non-linear Decision Boundary

In real-world datasets, this border may or may not be linear (straight-line like). This fictional border is commonly called a Decision Boundary. SVM provides a way to identify a variety of linear and nonlinear decision boundaries.

A typical case of linear and nonlinear decision boundary in the case of Email Spam classification may resemble the below figure.

SVMs were introduced to handle linearly separable cases. But we can address non-linearly separable cases also by introducing Kernel Tricks. It utilizes existing features, applies some transformations, and creates new features. These new features help SVM to find the nonlinear decision boundary in a transformed feature space. This idea is represented in the animated gif.

Source

Notice how the orange dots can be separated from the blue dots by inserting a horizontal plane in the transformed feature space. These kernel tricks are handy in identifying any complex shaped decision boundary and hence make SVMs more robust and useful.

Pros and Cons:

Pros:

  1. Robust: SVMs generate accurate results even when the decision boundary is nonlinear
  2. Memory efficient: Uses a minimal subset of the data for prediction
  3. Versatile: By the use of a suitable kernel function, it can solve many complex problems
  4. In practice, SVM models are generalized, with less risk of overfitting in SVM.

Cons:

  1. It takes extensive training time to find suitable separating hyperplane.
  2. The final model may be tough to interpret.

A good explanation/rationale of this algorithm can be found in the link here, the YouTube Video by Udacity for SVM.