TreeBagger parameter tuning for classification

How can I tune parameters for TreeBagger model for classification, I followed the example:"Tune Random Forest Using Quantile Error and Bayesian Optimization", https://fr.mathworks.com/help/stats/tune-random-forest-using-quantile-error-and-bayesian-optimization.html I only changed "regression" with "classification". The following code generated multiple errors:
results = bayesopt(@(params)oobErrRF(params,X),hyperparametersRF,...
'AcquisitionFunctionName','expected-improvement-plus','Verbose',0);
errors:
Error using classreg.learning.internal.table2FitMatrix>resolveName (line 232)
One or more 'ResponseName' parameter values are invalid.
Error in classreg.learning.internal.table2FitMatrix (line 77)
ResponseName = resolveName('ResponseName',ResponseName,FormulaResponseName,false,VarNames);
Error in ClassificationTree.prepareData (line 557)
[X,Y,vrange,wastable,varargin] =
classreg.learning.internal.table2FitMatrix(X,Y,varargin{:},'OrdinalIsCategorical',false);
Error in TreeBagger/init (line 1335)
ClassificationTree.prepareData(x,y,...
Error in TreeBagger (line 615)
bagger = init(bagger,X,Y,makeArgs{:});
Error in oobErrRF2 (line 16)
randomForest = TreeBagger(300,X,'MPG','Method','classification',...
Error in @(params)oobErrRF2(params,trainingDataFeatures)
Error in BayesianOptimization/callObjNormally (line 2184)
Objective = this.ObjectiveFcn(conditionalizeX(this, X));
Error in BayesianOptimization/callObjFcn (line 2145)
= callObjNormally(this, X);
Error in BayesianOptimization/callObjFcn (line 2162)
= callObjFcn(this, X);
Error in BayesianOptimization/performFcnEval (line 2128)
ObjectiveFcnObjectiveEvaluationTime, this] = callObjFcn(this, this.XNext);
Error in BayesianOptimization/run (line 1836)
this = performFcnEval(this);
Error in BayesianOptimization (line 450)
this = run(this);
Error in bayesopt (line 287)
Results = BayesianOptimization(Options);
I would like to know if there is a way to use this method of tuning for classification. If not, how can I tune my parameters for a TreeBagger classifier. Thanks.

2 commentaires

Don Mathis
Don Mathis le 8 Juin 2018
What version of MATLAB are you using? That's not the error I get using R2018a
It's R2017a.

Connectez-vous pour commenter.

Réponses (1)

The following works for me in R2018a. It predicts 'Cylinders' (3 classes) and it calls oobError to get the misclassification rate of the ensemble.
load carsmall
Cylinders = categorical(Cylinders);
Mfg = categorical(cellstr(Mfg));
Model_Year = categorical(Model_Year);
X = table(Acceleration,Cylinders,Displacement,Horsepower,Mfg,...
Model_Year,Weight,MPG);
rng('default'); % For reproducibility
maxMinLS = 20;
minLS = optimizableVariable('minLS',[1,maxMinLS],'Type','integer');
numPTS = optimizableVariable('numPTS',[1,size(X,2)-1],'Type','integer');
hyperparametersRF = [minLS; numPTS];
results = bayesopt(@(params)oobErrRF(params,X),hyperparametersRF,...
'AcquisitionFunctionName','expected-improvement-plus','Verbose',1);
bestOOBErr = results.MinObjective
bestHyperparameters = results.XAtMinObjective
Mdl = TreeBagger(300,X,'Cylinders','Method','classification',...
'MinLeafSize',bestHyperparameters.minLS,...
'NumPredictorstoSample',bestHyperparameters.numPTS);
function oobErr = oobErrRF(params,X)
%oobErrRF Trains random forest and estimates out-of-bag quantile error
% oobErr trains a random forest of 300 regression trees using the
% predictor data in X and the parameter specification in params, and then
% returns the out-of-bag quantile error based on the median. X is a table
% and params is an array of OptimizableVariable objects corresponding to
% the minimum leaf size and number of predictors to sample at each node.
randomForest = TreeBagger(300,X,'Cylinders','Method','classification',...
'OOBPrediction','on','MinLeafSize',params.minLS,...
'NumPredictorstoSample',params.numPTS);
oobErr = oobError(randomForest, 'Mode','ensemble');
end

9 commentaires

Thanks Don Mathis for your response. This example works, however when I wanted to apply the same optimization on my data it doesn't. It generates multiple errors:
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved in the
VariableDescriptions property.
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved in the
VariableDescriptions property.
Error using classreg.learning.internal.table2FitMatrix>resolveName (line 232)
One or more 'ResponseName' parameter values are invalid.
Error in classreg.learning.internal.table2FitMatrix (line 77) ResponseName = resolveName('ResponseName',ResponseName,FormulaResponseName,false,VarNames);
Error in ClassificationTree.prepareData (line 557) [X,Y,vrange,wastable,varargin] = classreg.learning.internal.table2FitMatrix(X,Y,varargin{:},'OrdinalIsCategorical',false);
Error in TreeBagger/init (line 1335) ClassificationTree.prepareData(x,y,...
Error in TreeBagger (line 615) bagger = init(bagger,X,Y,makeArgs{:});
Error in ModelTuning2>oobErrRF (line 61) randomForest = TreeBagger(21,X,'trainedDataAttackLabel','Method','classification',...
Error in ModelTuning2>@(params)oobErrRF(params,X)
Error in BayesianOptimization/callObjNormally (line 2184) Objective = this.ObjectiveFcn(conditionalizeX(this, X));
Error in BayesianOptimization/callObjFcn (line 2145) = callObjNormally(this, X);
Error in BayesianOptimization/callObjFcn (line 2162) = callObjFcn(this, X);
Error in BayesianOptimization/performFcnEval (line 2128) ObjectiveFcnObjectiveEvaluationTime, this] = callObjFcn(this, this.XNext);
Error in BayesianOptimization/run (line 1836) this = performFcnEval(this);
Error in BayesianOptimization (line 450) this = run(this);
Error in bayesopt (line 287) Results = BayesianOptimization(Options);
Error in ModelTuning2 (line 46) results = bayesopt(@(params)oobErrRF(params,X),hyperparametersRF,...
Don Mathis
Don Mathis le 18 Juin 2018
Modifié(e) : Don Mathis le 18 Juin 2018
That might be a bug in MATLAB but it's hard to tell without seeing your variable names. It seems to be saying that one or more of your table variable names are not valid MATLAB identifiers. Would it be easy for you to change the name(s) of those to make them MATLAB-compatible? I suspect that internally, classreg.learning.internal.table2FitMatrix is turning your table variables into MATLAB variables and it can't handle your names.
I am not sure, I have been working with these variable names without errors (in other programs). Now I am getting different errors. This is the code I used:
load networkTraffic
proto = categorical(cellstr(proto));
service = categorical(cellstr(service));
state = categorical(cellstr(state));
attack_cat=categorical(attack_cat); % Response
X = table(dur,proto,service,state,spkts,dpkts,sbytes,dbytes,rate,sttl,dttl,sload,dload,sloss,dloss,sinpkt,dinpkt,sjit,djit,swin,stcpb,dtcpb,dwin,...
tcprtt,synack,ackdat,smean,dmean,trans_depth,response_body_len,ct_srv_src,ct_state_ttl,...
ct_dst_ltm,ct_src_dport_ltm,ct_dst_sport_ltm,ct_dst_src_ltm,is_ftp_login,ct_ftp_cmd,...
ct_flw_http_mthd,ct_src_ltm,ct_srv_dst,is_sm_ips_ports);
At this point, Matlab does not recognize the variable dbytes, wich is is strange because it seems to recognize the ones before dbytes. The error is:
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved
in the VariableDescriptions property.
Warning: Variable names were modified to make them valid MATLAB identifiers. The original names are saved
in the VariableDescriptions property.
Undefined function or variable 'sbytes'.
Error in ModelTuning2 (line 34)
X =
table(dur,proto,service,state,spkts,dpkts,sbytes,dbytes,rate,sttl,dttl,sload,dload,sloss,dloss,sinpkt,dinpkt,sjit,djit,swin,stcpb,dtcpb,dwin,...
Thanks.
Don Mathis
Don Mathis le 22 Juin 2018
Modifié(e) : Don Mathis le 22 Juin 2018
Your issues seem to involve variable names and availability. The final error message:
Undefined function or variable 'sbytes'
seems to be a simple case of trying to pass to 'table' a variable called 'sbytes' which is undefined. To help any further I would need to see complete reproduction code.
Here you find the complete code. You could find the data set at NetworkTraffic. I hope you could find the error. Thanks.
load networkTraffic.mat
proto= categorical(cellstr(proto));
service= categorical(cellstr(service));
state = categorical(cellstr(state));
attackCat=categorical(cellstr(attack_cat));
X = table(dur,proto,service,state,spkts,dpkts,sbytes,dbytes,rate,sttl,dttl,sload,dload,sloss,...
dloss,sinpkt,dinpkt,sjit,djit,swin,stcpb,dtcpb,dwin,tcprtt,synack,ackdat,smean,dmean,trans_depth,response_body_len,ct_srv_src,ct_state_ttl,...
ct_dst_ltm,ct_src_dport_ltm,ct_dst_sport_ltm,ct_dst_src_ltm,is_ftp_login,ct_ftp_cmd,...
ct_flw_http_mthd,ct_src_ltm,ct_srv_dst,is_sm_ips_ports);
rng('default'); % For reproducibility
maxMinLS = 20;
minLS = optimizableVariable('minLS',[1,maxMinLS],'Type','integer');
numPTS = optimizableVariable('numPTS',[1,size(X,2)-1],'Type','integer');
hyperparametersRF = [minLS; numPTS];
results = bayesopt(@(params)oobErrRF(params,X),hyperparametersRF,...
'AcquisitionFunctionName','expected-improvement-plus','Verbose',1);
bestOOBErr = results.MinObjective
bestHyperparameters = results.XAtMinObjective
Mdl = TreeBagger(100,X,'attackCat','Method','classification',...
'MinLeafSize',bestHyperparameters.minLS,...
'NumPredictorstoSample',bestHyperparameters.numPTS);
function oobErr = oobErrRF(params,X)
%oobErrRF Trains random forest and estimates out-of-bag quantile error
% oobErr trains a random forest of 300 regression trees using the
% predictor data in X and the parameter specification in params, and then
% returns the out-of-bag quantile error based on the median. X is a table
% and params is an array of OptimizableVariable objects corresponding to
% the minimum leaf size and number of predictors to sample at each node.
randomForest = TreeBagger(100,X,'attackCat','Method','classification',...
'OOBPrediction','on','MinLeafSize',params.minLS,...
'NumPredictorstoSample',params.numPTS);
oobErr = oobError(randomForest, 'Mode','ensemble');
end
Don Mathis
Don Mathis le 25 Juin 2018
Sorry, I don't know how to open .rar files :(
Please find the mat file here: Network Traffic mat. Thanks.
>> load networkTraffic.mat
>> proto= categorical(cellstr(proto));
Undefined function or variable 'proto'.
What if we need to do kFold validation to optimize hyperparameters?

Connectez-vous pour commenter.

Produits

Version

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by