Support Vector Machine (SVM) Algorithm in Machine Learning & Python

What is a Support Vector Machine?

Support Vector Machine or SVM algorithm is a simple yet powerful Supervised Machine Learning algorithm that can be used for building both regression and classification models. SVM algorithm can perform really well with both linearly separable and non-linearly separable datasets. Even with a limited amount of data, the support vector machine algorithm does not fail to show its magic.
SVM2

SVM Figure 1: Linearly Separable and Non-linearly Separable Datasets

Types of Support Vector Machines

There are two types of Support Vector Machines:

Linear SVM or Simple SVM: Linear SVM is used for linearly separable data. If a dataset can be classified into two classes with a single straight line, then that data is considered to be linearly separable data, and the classifier is referred to as the linear SVM classifier. It is typically used for linear regression and classification problems.
Nonlinear SVM or Kernel SVM: Nonlinear SVM is used for nonlinearly separated data, i.e., a dataset that cannot be classified by using a straight line. The classifier used in this case is referred to as a nonlinear SVM classifier. It has more flexibility for nonlinear data because more features can be added to fit a hyperplane instead of a two-dimensional space.

Support Vector Machine Algorithm Example

Support vector machine or SVM algorithm is based on the concept of ‘decision planes’, where hyperplanes are used to classify a set of given objects.
Let us start off with a few pictorial examples of support vector machine algorithms. As we can see in Figure 2, we have two sets of data. These datasets can be separated easily with the help of a line, called a decision boundary.
SVM3

SVM Figure 2: Decision Boundary

But there can be several decision boundaries that can divide the data points without any errors. For example, in Figure 3, all decision boundaries classify the datasets correctly. But how do we pick the best decision boundary?
SVM44

SVM Figure 3: Other Possible Decision Boundaries

Get 100% Hike!

Master Most in Demand Skills Now!

Well, here’s the tip: the best decision boundary is the one that has a maximum distance from the nearest points of these two classes, as shown in Figure 4.
SVM6

SVM Figure 4: Maximum Distance from the Nearest Points

Also, remember that the nearest points from the optimal decision boundary that maximize the distance are called support vectors.
SVM7

SVM Figure 5: Margin and Maximum Margin Classifier

The region that the closest points define around the decision boundary is known as the margin.
That is why the decision boundary of a support vector machine model is known as the maximum margin classifier or the maximum margin hyperplane.

In other words, here’s how a support vector machine algorithm model works:

First, it finds lines or boundaries that correctly classify the training dataset.
Then, from those lines or boundaries, it picks the one that has the maximum distance from the closest data points.

Alright, in the above support vector machine example, the dataset was linearly separable. Now, the question, how do we classify non-linearly separable datasets as shown in Figure 6?
SVM8

SVM Figure 6: Non-linearly Separable Dataset

Clearly, straight lines can’t be used to classify the above dataset. That is where Kernel SVM comes into the picture.
SVM9

SVM Figure 7: After Using Kernel Support Vector Classifier

What does Kernel SVM do? How does it find the classifier? Well, the Kernel SVM projects the non-linearly separable datasets of lower dimensions to linearly separable data of higher dimensions. Kernel SVM performs the same in such a way that datasets belonging to different classes are allocated to different dimensions. Interesting, isn’t it?
Well, before exploring how to implement SVM in the Python programming language, let us take a look at the pros and cons of the support vector machine algorithm.

Advantages of Support Vector Machine Algorithm

It has a high level of accuracy
It works very well with limited datasets
Kernel SVM contains a non-linear transformation function to convert the complicated non-linearly separable data into linearly separable data
It is effective on datasets that have multiple features
It is effective when the number of features are greater than the number of data points
It employs a subset of training points in the decision function or support vectors, making SVM memory efficient
Apart from common kernels, it is also possible to specify custom kernels for the decision function

Disadvantages of Support Vector Machine Algorithm

Does not work well with larger datasets
Sometimes, training time with SVMs can be high
If the number of features is significantly greater than the number of data points, it is crucial to avoid overfitting when choosing kernel functions and regularization terms
Probability estimates are not directly provided by SVMs; rather, they are calculated by using an expensive fivefold cross-validation
It works best on small sample sets due to its high training time

How Does the Support Vector Machine Algorithm Work?

Let us consider two tags, yellow and blue, and our data has two features, x, and y. Given a pair of (x,y) coordinates, we want a classifier that outputs either yellow or blue. We plot the labeled training data on a plane:

An SVM takes these data points and outputs the hyperplane, which is simply a line in two-dimension, that best separates the tags. The line is the decision boundary. Anything falling to one side of it will be classified as yellow, and anything on the other side will be classified as blue.

For SVM, the best hyperplane is the one that maximizes the margins from both tags. It is the hyperplane whose distance to the nearest element of each tag is the largest.

The above was easy since the data was linearly separable—a straight line can be drawn to separate yellow and blue. However, in real scenarios, cases are usually not this simple. Consider the following case:

There is no linear decision boundary. The vectors are, however, very clearly segregated, and it seems as if it should be easy to separate them.

In this case, we will add a third dimension. Up until now, we have worked with two dimensions, x, and y. A new z dimension is introduced in this case. It is set to be calculated a certain way that is convenient, z = x² + y² (equation of a circle.) Taking a slice of this three-dimensional space looks like this:

Let us see what SVM can do with this:

Note that since we are in three dimensions now, the hyperplane is a plane parallel to the x-axis at a particular point in z, let us say z = 1. Now, it should be mapped back to two dimensions:

There we go! The decision boundary is a circumference with radius 1, and it separates both tags by using SVM.

Calculating the transformation can get pretty expensive computationally. One may deal with a lot of new dimensions, each possibly involving a complicated calculation. Hence, doing this for every vector in the dataset will be a lot of work.

Here’s the solution: SVM does not need actual vectors to work its magic. It can get by with dot products between them alone. So, one can sidestep the expensive calculations of new dimensions.

This is what can be done instead:

Imagine the new space to be:
```
<br>
z = x² + y²<br>
```

Figure out the dot product in that space:

<br>
a · b = xa · xb  +  ya · yb  +  za · zb<br>
a · b = xa · xb  +  ya · yb +  (xa² + ya²) · (xb² + yb²)<br>

Tell SVM to do its thing by using the new dot product called a kernel function.

That’s it!

SVM libraries are packed with some popular kernels such as Polynomial, Radial Basis Function or RBF, and Sigmoid. The classification function used in SVM in Machine Learning is SVC. The SVC function looks like this:

<br>
sklearn.SVM.SVC (C=1.0, kernel= ‘rbf’, degree=3)<br>

Important parameters

C: Keeping large values of C will indicate the SVM model to choose a smaller margin hyperplane. A small value of C will indicate the SVM model to choose a larger margin hyperplane.
kernel: It is the kernel type to be used in SVM model building. It can be ‘linear’, ‘rbf’, ‘poly’, or ‘sigmoid’. The default value of the kernel is ‘rbf’.
degree: It’s only considered in the case of the polynomial kernel. It is the degree of the polynomial kernel function. The default value of a degree is 3.

Alright, let us dive right into the hands-on SVM in the Python programming language.

SVM Parameters

SVM Parameters include the values, estimators, and various constraints used to implement ML algorithms. There are three types of SV parameters in a Neural Network:

Kernel

Kernel transforms the input data into any first as per the user requirements. The Kernels used in SVM could be linear, polynomial, radial basis functions(RBFs), and non-linear hyperplanes, created using the polynomial and RBF functions. You can obtain accurate classifiers by separating non-linear classes through an advanced kernel.

Regularization

The C parameters in Scikit-learn denote the error or penalty representing any miscalculation. You can maintain regularization by understanding the miscalculation and changing the decision boundary through tweaking the C parameters.

Gamma

Gamma parameters determine their influence over a single training example. There are two types of gamma parameters, low meaning ‘far’ and high meaning ‘close’ values. The low or far values define a Gaussian function with a large variance. Whereas, high or close values define it with small variance.

Applications of SVM

SVM is mainly used to classify the unseen data and have various application in different fields:

Face Detection

Classifies the images of people’s faces in an environment from non-face by creating a square box around it.

Bioinformatics

The Support vector machines are used for gene classification that allows researchers to differentiate between various proteins and identify biological problems and cancer cells.

Text Categorization

Used in training models that are used to classify the documents into different categories based on the score, types, and other threshold values.

Generalized Predictive Control(GPC)

Provides you control over different industrial processes with multivariable version and interactor matrix. GPC is used in various industries like cement mills, robotics, spraying, etc.

Handwriting Recognization

SVM is widely used to recognize handwritten characters and test them against pre-existing data.

Image Classification

Compared to the traditional query-based searching techniques, SVM has better accuracy when it comes to search and classifying the images based on various features.

Building a Support Vector Machine Classification Model in Machine Learning Using Python

Problem Statement: Use Machine Learning to predict cases of breast cancer using patient treatment history and health data
Dataset: Breast Cancer Wisconsin (Diagnostic) Dataset
Let us have a quick look at the dataset:
SVM10
Classification Model Building: Support Vector Machine in Python
Let us build the classification model with the help of a Support Vector Machine algorithm.
Step 1: Load the Pandas library and the dataset using Pandas
SVM11

<br>
import pandas as pd<br>
dataset = pd.read_csv('Cancer_data.csv')<br>

Let us have a look at the shape of the dataset:

<br>
dataset.shape<br>

Step 2: Define the features and the target
SVM14

<br>
X = dataset.drop('Diagnosis',axis=1)<br>
y = dataset['Diagonsis']<br>

Have a look at the features:

<br>
X<br>

SVM16
Have a look at the target:

<br>
y<br>

SVM18
Step 3: Split the dataset into train and test using sklearn before building the SVM algorithm model
SVM19

<br>
from sklearn.model_selection import train_test_split<br>
X_train, X_test ,y_train , y_test = train_test_split(X, y , test_size = 0.20)<br>

Become the Go-To Expert in AI and ML

Achieve More with AI and ML Training

Explore Program

Step 4: Import the support vector classifier function or SVC function from Sklearn SVM module. Build the Support Vector Machine model with the help of the SVC function
SVM20

<br>
from sklearn.svm import SVC<br>
svclassifier = SVC(kernel='linear')<br>
svclassifier.fit(X_train, y_train)<br>

Step 5: Predict values using the SVM algorithm model

<br>
y_pred = svclassifier.predict(X_test)<br>

Step 6: Evaluate the Support Vector Machine model
SVM23

<br>
from sklearn.metrics import classification_report, confusion_matrix<br>
print(confusion_matrix(y_test,y_pred))<br>
print(classification_report(y_test,y_pred))<br>

Implementing Kernel SVM with Sklearn SVM module

SVM24

<br>
import numpy as np<br>
import matplotlib.pyplot as plt<br>
import pandas as pd<br>

Polynomial SVM Kernel

Importing the libraries:
Importing the SVC function and setting kernel as ‘poly’:
SVM25

<br>
from sklearn.svm import SVC<br>
svclassifier1 = SVC(kernel = 'poly',degree=8)<br>
svclassifier1.fit(X_train , y_train)<br>

SVM26
Making predictions:

<br>
y_pred1 = svclassifier1.predict(X_test)<br>

Evaluating the model:
SVM28

<br>
from sklearn.metrics import classification_report , confusion_matrix<br>
print(confusion_matrix(y_test , y_pred1))<br>
print(classification_report(y_test , y_pred1))<br>

SVM33

Gaussian Kernel

Importing the SVC function and setting kernel as ‘rbf’:
SVM30

<br>
from sklearn.svm import SVC<br>
svclassifier2 = SVC(kernel ='rbf')<br>
svclassifier2.fit(X_train , y_train)<br>

SVM34
Making predictions:

<br>
y_pred2 = svclassifier2.predict(X_test)<br>
from sklearn.metrics import classification_report, confusion_matrix<br>
print(confusion_matrix(y_test , y_pred2))<br>
print(classification_report(y_test , y_pred2))<br>

SVM36
SVM37

Sigmoid Kernel

Importing the SVC function and setting SVM kernel as ‘sigmoid’:
SVM38

<br>
from sklearn.svm import SVC<br>
svclassifier3 = SVC(kernel ='sigmoid')<br>
svclassifier3.fit(X_train ,y_train)<br>

SVM40
Making predictions:

<br>
y_pred3 = svclassifier3.predict(X_test)<br>

Evaluating the model:
SVM41

<br>
from sklearn.metrics import classification_report , confusion_matrix<br>
print(confusion_matrix(y_test,y_pred3))<br>
print(classification_report(y_test,y_pred3))<br>

What have we learned so far?

In this SVM tutorial blog, we answered the question, ‘what is SVM?’ Some other important concepts such as SVM’s full form, the pros and cons of the SVM algorithm, and SVM examples, are also highlighted in this blog. We also learned how to build support vector machine models with the help of the support vector classifier function. Additionally, we talked about the implementation of Kernel SVM in Python and Sklearn, which is a very useful method while dealing with non-linearly separable datasets.

Watch this Video on Mathematics for Machine Learning

Compare the SVM Machine Learning model with other Supervised Machine Learning classification models like Random Forest and Decision Tree!

We hope this tutorial helps you gain knowledge of Machine Learning Training. If you are looking to learn Machine Learning Course Online in a systematic manner with expert guidance and support then you can enroll to our Online Machine Learning Course.