2 views

The kernel trick maps a non-linear problem into a linear problem.

My questions are:

1. What is the main difference between a linear and a non-linear problem? What is the intuition behind the difference between these two classes of the problem? And How does kernel trick helps use the linear classifiers on a non-linear problem?

2. Why is the dot product so important in the two cases?

Thanks.

by (108k points)

It is based on your dataset. If the dataset has high variance, you need to reduce the number of features and add more datasets. After that use a non-linear method for classification.

If the dataset with low variance, use a linear model.

We use Linear and non-linear classifier under following conditions:

If accuracy is more important to you than the training time, then use Non-linear else use Linear Classifier. This happens because a linear classifier uses linear kernels and is faster than non-linear kernels used in the non-linear classifier.

Linear classifier (SVM) is used when the number of features is huge for example document classification. This is because Linear SVM gives similar accuracy as non-linear SVM but it is very fast in such cases.

Use a non-linear classifier when data is not linearly separable. Under such conditions, linear classifiers give very poor results (accuracy) and non-linear gives better results. This is because of non-linear Kernels map (transform) the input data (Input Space) to higher-dimensional space( called Feature Space) where a linear hyperplane can be easily found.

A kernel is a function K(x1, x2) that has the property that

K(x1, x2) = <phi(x1), phi(x2)>

for some function phi(). In other words: We can evaluate the dot product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.

Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinite-dimensional feature space. The kernel trick allows us to compute dot products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).

For the best of career growth, check out Artificial Intelligence Online Course and get certified.