Intellipaat Back

Explore Courses Blog Tutorials Interview Questions
+1 vote
in Machine Learning by (4.2k points)
The meat of my question is "how does one design a kernel function for a learning problem?"

As a quick background, I'm reading books on support vector machines and kernel machines, and everywhere I look authors give examples of kernels (polynomial kernels both homogeneous and nonhomogeneous, Gaussian kernels, and allusions to text-based kernels to name a few), but all either provide pictures of the results without specifying the kernel, or vaguely claim that "an efficient kernel can be constructed". I'm interested in the process that goes on when one designs a kernel for a new problem.

Probably the easiest example is learning XOR, a smallest (4 points) non-linear data set as embedded the real plane. How would one come up with a natural (and non-trivial) kernel to linearly separate this data?

As a more complex example (see Cristianini, Introduction to SVMs, figure 6.2), how would one design a kernel to learn a checkerboard pattern? Cristianini states the picture was derived "using Gaussian kernels" but it seems that he uses multiple, and they are combined and modified in an unspecified way.

If this question is too broad to answer here, I'd appreciate a reference to the construction of one such kernel function, though I'd prefer the example be somewhat simple.

1 Answer

+1 vote
by (6.8k points)

(For anyone not familiar with the use of kernel functions in Machine Learning, kernels just maps the input vectors (data points that comprise the data set) into a higher-dimensional space, aka, the "Feature Space". The SVM then finds a separating hyperplane with the maximal margin (distance between the hyperplane and the support vectors) in this transformed space.)

Well, start with kernels that are known to work with SVM classifiers to solve the problem of interest. In this case, we know that the RBF (radial basis function) kernel w/ a trained SVM, cleanly separates XOR. You can write associate RBF operate in Python this way:

def RBF():

  return NP.exp(-gamma * NP.abs(x - y)**2)

In which gamma is 1/number of features (columns in the data set), and x, y is a Cartesian pair.

(A radial basis function module is also in scipy.interpolate.Rbf)

Second, if what you are after is not just using available kernel functions to solve classification/regression problems, but instead you want to build your own, I would suggest first studying how the selection of kernel operates and also the parameters within those functions have an effect on classifier performance. The small cluster of kernel functions in common use with SVM/SVC is the best place to begin. This group is comprised of (aside from RBF):

  • linear kernel
  • polynomial
  • sigmoid

Browse Categories