Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I know that Cross-validation is used for selecting good parameters. After finding them, I need to re-train the whole data without the -v option.

But the problem I face is that after I train with -v option, I get the cross-validation accuracy( e.g 85%). There is no model and I can't see the values of C and gamma. In that case, how do I retrain?

Btw I applying 10 fold cross-validation. e.g

optimization finished, #iter = 138

nu = 0.612233

obj = -90.291046, rho = -0.367013

nSV = 165, nBSV = 128

Total nSV = 165

Cross Validation Accuracy = 98.1273%

I need some help with it.

To get the best C and gamma, I use this code that is available in the LIBSVM FAQ

bestcv = 0;

for log2c = -6:10,

  for log2g = -6:3,

    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];

    cv = svmtrain(TrainLabel,TrainVec, cmd);

    if (cv >= bestcv),

      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;

    end

    fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);

  end

Another question: Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracies similar?

Another question: Cross-validation basically improves the accuracy of the model by avoiding overfitting. So, it needs to have a model in place before it can improve. Am I right? Besides that, if I have a different model, then the cross-validation accuracy will be different? Am I right?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

The graph is something like this

image

Then the values of C are 2 and gamma = 0.0078125. But when I retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance

1 Answer

0 votes
by (33.1k points)

You should simply perform model selection, for that you have to implement a grid search using cross-validation, to find the best values of C and gamma.

Simply create a grid of values using MESHGRID, iterate overall all pairs (C, gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy.

For example:

# read some training data

[labels,data] = libsvmread('./heart_scale');

# grid of parameters

folds = 5;

[C,gamma] = meshgrid(-5:2:15, -15:2:3);

# grid search, and cross-validation

cv_acc = zeros(numel(C),1);

for i=1:numel(C)

    cv_acc(i) = svmtrain(labels, data, ...

                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));

end

# pair (C,gamma) with best accuracy

[~,idx] = max(cv_acc);

# contour plot of paramter selection

contour(C, gamma, reshape(cv_acc,size(C))), colorbar

hold on

plot(C(idx), gamma(idx), 'rx')

text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...

    'HorizontalAlign','left', 'VerticalAlign','top')

hold off

xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')

# now you can train you model using best_C and best_gamma

best_C = 2^C(idx);

best_gamma = 2^gamma(idx)

Output:

image

To know more, study Machine Learning Online Course and Machine Learning Tutorial

Hope this answer helps you!


 

Browse Categories

...