Matlab: Error using classreg.l​earning.Fi​tTemplate/​fit with hyperparameter optimization of SVM

21 vues (au cours des 30 derniers jours)
I am using Bayesian optimization (bayesopt function) in Matlab for hyperparameter optimization of SVM classifier. The optimization goal is to minimize 10-fold cross validation error. Here is the code that I use:
KernelFlag = 1;
c = cvpartition(size(XTrain,1),'KFold',10);
sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder = optimizableVariable('polyOrder',[2,4]);
fun = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
'PlotFcn',{@plotMinObjective},...
'AcquisitionFunctionName','expected-improvement-plus');
and mysvmfunTest:
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
t = templateSVM('Standardize',1,'KernelFunction','RBF',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
'RemoveDuplicates',true);
else
t = templateSVM('Standardize',1,'KernelFunction','linear',...
'BoxConstraint',z.box,'KernelScale',z.sigma,...
'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames);
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
I have used this code before, with different datasets. But, lately when I try to use it on a new dataset, it throws me an error:
Error using classreg.learning.FitTemplate/fit (line 249) You passed a cvpartition object for 27152 observations, but the input data have only 10395 observations. Some observations may have been removed because they have NaN values for all predictors, missing response values or zero weights. When cross-validating an existing object, consider using the RowsUsed property to determine what size partition is required.
I checked all the data, there is no nan, or missing values in my data. I even removed all the samples which have any feature between 0 and .01 (all my features are positive). Still have the same problem and get the same error. I guess the error is due to the existence of samples that are perhaps too close, resulting into removal of many of the observations, but I am not sure that is the case. Any idea where this error might come from or any suggestion how I can solve this issue?

Réponse acceptée

Ilya
Ilya le 26 Oct 2018
You are passing ClassNames to fitcecoc - are your ClassNames a subset of all class names you have in yTrain?
Train one ECOC model using
SVMModel = fitcecoc(XTrain,yTrain,'Learners',t,'ClassNames',classNames);
and look at the size of property X in SVMModel. Does it have as many rows as XTrain does?

Plus de réponses (1)

Don Mathis
Don Mathis le 26 Oct 2018
Modifié(e) : Don Mathis le 26 Oct 2018
Maybe your use of 'RemoveDuplicates' is causing observations to be removed?
I ran your code on some synthetic data that has no duplicates in XTrain and it works fine:
XTrain = rand(1000,10);
yTrain = categorical(round(XTrain(:,1)*3));
classNames = categories(yTrain);
KernelFlag = 1;
c = cvpartition(size(XTrain,1),'KFold',10);
sigma = optimizableVariable('sigma',[1e-5,1e5],'Transform','log');
box = optimizableVariable('box',[1e-5,1e5],'Transform','log');
polyOrder = optimizableVariable('polyOrder',[2,4]);
fun = @(z)mysvmfunTest(z,XTrain,yTrain,c,classNames,KernelFlag);
results = bayesopt(fun,[sigma,box,polyOrder],'IsObjectiveDeterministic',true,...
'PlotFcn',{@plotMinObjective},...
'AcquisitionFunctionName','expected-improvement-plus');
function [objective] = mysvmfunTest(z,X,Y,c,classNames,KernelFlag)
if KernelFlag == 1
t = templateSVM('Standardize',1,'KernelFunction','RBF',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'RemoveDuplicates',true);
elseif KernelFlag == 2
t = templateSVM('Standardize',1,'KernelFunction','polynomial',...
'BoxConstraint',z.box,'KernelScale',z.sigma,'PolynomialOrder',z.polyOrder,...
'RemoveDuplicates',true);
else
t = templateSVM('Standardize',1,'KernelFunction','linear',...
'BoxConstraint',z.box,'KernelScale',z.sigma,...
'RemoveDuplicates',true);
end
SVMModel = fitcecoc(X,Y,'Learners',t,'ClassNames',classNames);
cvModel = crossval(SVMModel,'CVPartition',c);
objective = kfoldLoss(cvModel);
end
By the way, it's probably best to declare polyOrder to be an integer:
polyOrder = optimizableVariable('polyOrder',[2,4],'Type','integer');
  1 commentaire
Nick Zadeh
Nick Zadeh le 26 Oct 2018
Modifié(e) : Nick Zadeh le 26 Oct 2018
Thank you for your response Don! I tried removing ' RemoveDuplicates',true, but I still get the same error message. As I mentioned before, I used this code on different datasets and I never got any error. Do you have any idea what might cause the problem in this case? Is there any part of this code that automatically removes samples that are too close? Or do you know of any existing bug?

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by