Explore Courses Blog Tutorials Interview Questions
+1 vote
in Machine Learning by (4.2k points)
I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images), I want to train and test an SVM. I'm not very experienced using SVMs though.

What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.

This results in using equal-sized training and test-sets giving reasonable results. But if I only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.

How can I remove the dependency between the number of training images and features, so I can test with any number of images?

I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].

1 Answer

+1 vote
by (6.8k points)

You appear to have already got discovered the problems... consistent with the README file enclosed within the MATLAB package:

To use the precomputed kernel, you need to include sample serial numbers because of the first column of the training and testing information.

Let me illustrate with an example:

%# read dataset

[dataClass, data] = libsvmread('./heart_scale');

%# split into train/test datasets

trainData = data(1:150,:);

testData = data(151:270,:);

trainClass = dataClass(1:150,:);

testClass = dataClass(151:270,:);

numTrain = size(trainData,1);

numTest = size(testData,1);

%# radial basis function: exp(-gamma*|u-v|^2)

sigma = 2e-3;

rbfKernel = @(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);

%# compute kernel matrices between every pairs of (train,train) and

%# (test,train) instances and include sample serial number as first column

K =  [ (1:numTrain)' , rbfKernel(trainData,trainData) ];

KK = [ (1:numTest)'  , rbfKernel(testData,trainData)  ];

%# train and test

model = svmtrain(trainClass, K, '-t 4');

[predClass, acc, decVals] = svmpredict(testClass, KK, model);

%# confusion matrix

C = confusionmat(testClass,predClass)

The output:

optimization finished, #iter = 70

nu = 0.933333

obj = -117.027620, rho = 0.183062

nSV = 140, nBSV = 140

Total nSV = 140

Accuracy = 85.8333% (103/120) (classification)

C = 65     5

    12    38

Browse Categories