Back

Explore Courses Blog Tutorials Interview Questions
0 votes
2 views
in Machine Learning by (19k points)

I know that Cross-validation is used for selecting good parameters. After finding them, I need to re-train the whole data without the -v option.

But the problem I face is that after I train with -v option, I get the cross-validation accuracy( e.g 85%). There is no model and I can't see the values of C and gamma. In that case, how do I retrain?

Btw I applying 10 fold cross-validation. e.g

optimization finished, #iter = 138

nu = 0.612233

obj = -90.291046, rho = -0.367013

nSV = 165, nBSV = 128

Total nSV = 165

Cross Validation Accuracy = 98.1273%

I need some help with it.

To get the best C and gamma, I use this code that is available in the LIBSVM FAQ

bestcv = 0;

for log2c = -6:10,

  for log2g = -6:3,

    cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];

    cv = svmtrain(TrainLabel,TrainVec, cmd);

    if (cv >= bestcv),

      bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;

    end

    fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);

  end

Another question: Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracies similar?

Another question: Cross-validation basically improves the accuracy of the model by avoiding overfitting. So, it needs to have a model in place before it can improve. Am I right? Besides that, if I have a different model, then the cross-validation accuracy will be different? Am I right?

One more question: In the cross-validation accuracy, what is the value of C and gamma then?

The graph is something like this

image

Then the values of C are 2 and gamma = 0.0078125. But when I retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason? Thanks in advance

1 Answer

0 votes
by (33.1k points)

You should simply perform model selection, for that you have to implement a grid search using cross-validation, to find the best values of C and gamma.

Simply create a grid of values using MESHGRID, iterate overall all pairs (C, gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy.

For example:

# read some training data

[labels,data] = libsvmread('./heart_scale');

# grid of parameters

folds = 5;

[C,gamma] = meshgrid(-5:2:15, -15:2:3);

# grid search, and cross-validation

cv_acc = zeros(numel(C),1);

for i=1:numel(C)

    cv_acc(i) = svmtrain(labels, data, ...

                    sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));

end

# pair (C,gamma) with best accuracy

[~,idx] = max(cv_acc);

# contour plot of paramter selection

contour(C, gamma, reshape(cv_acc,size(C))), colorbar

hold on

plot(C(idx), gamma(idx), 'rx')

text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...

    'HorizontalAlign','left', 'VerticalAlign','top')

hold off

xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')

# now you can train you model using best_C and best_gamma

best_C = 2^C(idx);

best_gamma = 2^gamma(idx)

Output:

image

To know more, study Machine Learning Online Course and Machine Learning Tutorial

Hope this answer helps you!


 

Welcome to Intellipaat Community. Get your technical queries answered by top developers!

30.5k questions

32.6k answers

500 comments

108k users

Browse Categories

...