It is based on your dataset. If the dataset has high variance, you need to reduce the number of features and add more datasets. After that use a non-linear method for classification.
If the dataset with low variance, use a linear model.
We use Linear and non-linear classifier under following conditions:
If accuracy is more important to you than the training time, then use Non-linear else use Linear Classifier. This happens because a linear classifier uses linear kernels and is faster than non-linear kernels used in the non-linear classifier.
Linear classifier (SVM) is used when the number of features is huge for example document classification. This is because Linear SVM gives similar accuracy as non-linear SVM but it is very fast in such cases.
Use a non-linear classifier when data is not linearly separable. Under such conditions, linear classifiers give very poor results (accuracy) and non-linear gives better results. This is because of non-linear Kernels map (transform) the input data (Input Space) to higher-dimensional space( called Feature Space) where a linear hyperplane can be easily found.
A kernel is a function K(x1, x2) that has the property that
K(x1, x2) = <phi(x1), phi(x2)>
for some function phi(). In other words: We can evaluate the dot product in the low-dimensional data space (where x1 and x2 "live") without having to transform to the high-dimensional feature space (where phi(x1) and phi(x2) "live") -- but we still get the benefits of transforming to the high-dimensional feature space. This is called the kernel trick.
Many popular kernels, such as the Gaussian kernel, actually correspond to a transform phi() that transforms into an infinite-dimensional feature space. The kernel trick allows us to compute dot products in this space without having to represent points in this space explicitly (which, obviously, is impossible on computers with finite amounts of memory).