Understanding MatLab's built-in SVM cross-validation on fitcsvm

8 vues (au cours des 30 derniers jours)
Carlos Mendoza
Carlos Mendoza le 30 Août 2020
Commenté : Xingwang Yong le 3 Oct 2020
I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
which gives me class_error equals to 0.4151.
However, if I tried MatLab's built-in SVM cross-validation
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'Leaveout', 'on', 'BoxConstraint', 0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?
I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.
Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.
Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?
Update 2: I attached the data.
Thanks!
  2 commentaires
Image Analyst
Image Analyst le 31 Août 2020
No answers probably because you forgot to attach your data.
Carlos Mendoza
Carlos Mendoza le 31 Août 2020
I didn't forget. I thought that the code would be enough. Probably an error.

Connectez-vous pour commenter.

Réponses (1)

Xingwang Yong
Xingwang Yong le 29 Sep 2020
Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.
https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle
  2 commentaires
Carlos Mendoza
Carlos Mendoza le 1 Oct 2020
The default is 'classiferror', which is what I am using:
What do you mean by "1-accuracy"?
Xingwang Yong
Xingwang Yong le 3 Oct 2020
class_error = error_count / n_trials;
= (n_trials - correct_count) / n_trials
= 1 - correct_count / n_trials
= 1 - accuracy
That is your definition of loss.

Connectez-vous pour commenter.

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by