Understanding MatLab's built-in SVM cross-validation on fitcsvm
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a dataset of 53 trials and I want to do leave-one-out cross-validation of a binary classifier. I tried to explicitly do the cross-validation of an SVM, with this code:
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'BoxConstraint', 0.046125, 'ClassNames', class_names};
SVMModel = cell(53,1);
for i_trial = 1:53
%% Train
train_set_indices = [1:i_trial-1 i_trial+1:n_trials];
SVMModel{i_trial} = fitcsvm(input_data(train_set_indices, :), ...
true_labels(train_set_indices), SVM_params{:});
%% Predict
[estimated_labels(i_trial), score] = predict(SVMModel{i_trial}, ...
input_data(i_trial, :));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
which gives me class_error equals to 0.4151.
However, if I tried MatLab's built-in SVM cross-validation
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, ...
'Leaveout', 'on', 'BoxConstraint', 0.046125, 'ClassNames', class_names};
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
CSVM.kfoldLoss would be equal to 0.3208. Why the difference? What I am doing wrong in my explicit cross-validation?
I did the same exercise with 'Standarize', off and 'KernelScale', 987.8107 (optimized hyperparameters), and the difference is more dramatic: class_error=0.4528, while CSVM.kfoldLoss=0.
Finally, I would also like to know how what was the training and validation set for each of the trained models in CSVM.Trained. I would like to call predict on each trained model with the left-out sample (trial) and compare the result with CSVM.kfoldPredict.
Update 1: I found that c.traininig and c.test return the indices of the training and test sets. However, this code
SVM_params = {'KernelFunction', 'linear', 'Standardize', true, 'CVPartition', c,...
'BoxConstraint', BoxConstraint, 'ClassNames', class_names};
estimated_labels = cell(1,53);
CSVM = fitcsvm(input_data, true_labels, SVM_params{:});
for ii=1:53
estimated_labels(ii) = predict(CSVM.Trained{ii}, input_data(c.test(ii),:,1));
end
error_count = sum(~strcmp(true_labels, estimated_labels));
class_error = error_count / n_trials;
gives me class_error=0.5849, which is different to CSVM.kfoldLoss (0.3208). Why the difference? Is this the right way to double-check the cross-validation?
Update 2: I attached the data.
Thanks!
2 commentaires
Réponses (1)
Xingwang Yong
le 29 Sep 2020
Maybe kfoldLoss uses a different definition of loss than yours. Your definition is 1-accuracy.
https://www.mathworks.com/help/stats/classreg.learning.partition.regressionpartitionedkernel.kfoldloss.html?s_tid=srchtitle
2 commentaires
Xingwang Yong
le 3 Oct 2020
class_error = error_count / n_trials;
= (n_trials - correct_count) / n_trials
= 1 - correct_count / n_trials
= 1 - accuracy
That is your definition of loss.
Voir également
Catégories
En savoir plus sur Classification Trees dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!